Business Software for Hadoop - Page 6

Top Software that integrates with Hadoop as of August 2025 - Page 6

Hadoop Clear Filters
  • 1
    BicDroid

    BicDroid

    BicDroid

    Installed in your Intranet, QWS Server integrates all channels and tools for managing and controlling QWS Endpoints. It intelligently monitors all active QWS Endpoints in a way similar to how airplanes and spaceships in flight are monitored by ground stations. Installed on a personal or corporate-managed computer (the “Host”), QWS Endpoint creates on the Host a fully secure quarantined work environment (i.e., QWS), which is a fully secure extension of your corporate Intranet work environment. Data inside QWS is quarantined from the Host as well as any other network or Internet resource that is not explicitly allowed by your corporate policy. Using QWS for work, employees are more productive than before. QWS Connector creates a fully secure tunnel between each QWS Endpoint and configured corporate Intranet(s). The encrypted tunnel is established on-demand, enabling employees to use QWS to work offline without connecting to the Intranet.
  • 2
    Azkaban

    Azkaban

    Azkaban

    Azkaban is a distributed Workflow Manager, implemented at LinkedIn to solve the problem of Hadoop job dependencies. We had jobs that needed to run in order, from ETL jobs to data analytics products. After version 3.0, we provide two modes: the stand alone “solo-server” mode and distributed multiple-executor mode. The following describes the differences between the two modes. In solo server mode, the DB is embedded H2 and both web server and executor server run in the same process. This should be useful if one just wants to try things out. It can also be used on small scale use cases. The multiple executor mode is for most serious production environment. Its DB should be backed by MySQL instances with master-slave set up. The web server and executor servers should ideally run in different hosts so that upgrading and maintenance shouldn’t affect users. This multiple host setup brings in robust and scalable aspect to Azkaban.
  • 3
    DigDash

    DigDash

    DigDash

    Every day, your business generates countless data. Used correctly, this data is invaluable. Aggregated together, this strategic information opens up an ocean of opportunities. Expert in business intelligence, DigDash accompanies you through a reliable solution to simply exploit your data and increase your performance today. From design to deployment, from questions of use to development needs, DigDash is by your side for the long term, in a close relationship. In a desire for continuous improvement, flexibility is at the heart of our DNA. Our software stands out for its ease of use at all levels. The solution is recognized as one of the most powerful on the market. Whatever your operational vision, our tool adapts to your business specificities. Thanks to enlightened real-time visibility on all your activities, from marketing to finance, from sales to HR, your managers are able to make rational decisions at the right time.
  • 4
    Semarchy xDI
    Experience Semarchy’s flexible unified data platform to empower better business decisions enterprise-wide. Integrate all your data with xDI, the high-performance, agile, and extensible data integration for all styles and use cases. Its single technology federates all forms of data integration, and mapping converts business rules into deployable code. xDI has extensible and open architecture supporting on-premise, cloud, hybrid, and multi-cloud environments.
  • 5
    Yottamine

    Yottamine

    Yottamine

    Our highly innovative machine learning technology is designed specifically to accurately predict financial time series where only a small number of training data points are available. Advance AI is computationally consuming. YottamineAI leverages the cloud to eliminate the need to invest time and money on managing hardware, shortening the time to benefit from higher ROI significantly. Strong encryption and protection of keys ensure trade secrets stay safe. We follow the best practices of AWS and utilize strong encryption to secure your data. We evaluate how your existing or future data can generate predictive analytics in helping you make information-based decisions. If you need predictive analytics on a project basis, Yottamine Consulting Services provides project-based consulting to accommodate your data-mining needs.
  • 6
    Wherobots

    Wherobots

    Wherobots

    Wherobots enables users to easily develop, test, and deploy geospatial data analytics and AI pipelines within the user's existing data stack. That can be deployed in the cloud. Users do not have to worry about the hassle of resource administration, workload scalability, and geospatial processing support/optimization. Connect your Wherobots account to the cloud database where the data is stored using our SaaS web interface. Develop your geospatial data science, machine learning, or analytics application using Sedona Developer Tool. Schedule automatic deployment of your geospatial pipeline to the cloud data platform and monitor the performance in Wherobots. Consume the outcome of your geospatial analytics task. The consumption model can be through a single geospatial map visualization or API calls.
  • 7
    Apache Mahout

    Apache Mahout

    Apache Software Foundation

    Apache Mahout is a powerful, scalable, and versatile machine learning library designed for distributed data processing. It offers a comprehensive set of algorithms for various tasks, including classification, clustering, recommendation, and pattern mining. Built on top of the Apache Hadoop ecosystem, Mahout leverages MapReduce and Spark to enable data processing on large-scale datasets. Apache Mahout(TM) is a distributed linear algebra framework and mathematically expressive Scala DSL designed to let mathematicians, statisticians, and data scientists quickly implement their own algorithms. Apache Spark is the recommended out-of-the-box distributed back-end or can be extended to other distributed backends. Matrix computations are a fundamental part of many scientific and engineering applications, including machine learning, computer vision, and data analysis. Apache Mahout is designed to handle large-scale data processing by leveraging the power of Hadoop and Spark.
  • 8
    Determined AI

    Determined AI

    Determined AI

    Distributed training without changing your model code, determined takes care of provisioning machines, networking, data loading, and fault tolerance. Our open source deep learning platform enables you to train models in hours and minutes, not days and weeks. Instead of arduous tasks like manual hyperparameter tuning, re-running faulty jobs, and worrying about hardware resources. Our distributed training implementation outperforms the industry standard, requires no code changes, and is fully integrated with our state-of-the-art training platform. With built-in experiment tracking and visualization, Determined records metrics automatically, makes your ML projects reproducible and allows your team to collaborate more easily. Your researchers will be able to build on the progress of their team and innovate in their domain, instead of fretting over errors and infrastructure.
  • 9
    Informatica Dynamic Data Masking
    Your IT organization can apply sophisticated masking to limit sensitive data access with flexible data masking rules based on a user’s authentication level. Blocking, auditing, and alerting your users, IT personnel, and outsourced teams who access sensitive information, it ensures compliance with your security policies and industry and civil privacy regulations. Easily customize data-masking solutions for different regulatory or business requirements. Protect personal and sensitive information while supporting offshoring, outsourcing, and cloud-based initiatives. Secure big data by dynamically masking sensitive data in Hadoop.
  • 10
    Baidu Palo

    Baidu Palo

    Baidu AI Cloud

    Palo helps enterprises to create the PB-level MPP architecture data warehouse service within several minutes and import the massive data from RDS, BOS, and BMR. Thus, Palo can perform the multi-dimensional analytics of big data. Palo is compatible with mainstream BI tools. Data analysts can analyze and display the data visually and gain insights quickly to assist decision-making. It has the industry-leading MPP query engine, with column storage, intelligent index,and vector execution functions. It can also provide in-library analytics, window functions, and other advanced analytics functions. You can create a materialized view and change the table structure without the suspension of service. It supports flexible and efficient data recovery.
  • 11
    LightBeam.ai

    LightBeam.ai

    LightBeam.ai

    Discover within minutes if sensitive information lurks in places you never expected (screenshots, logs, tickets, messages, tables). With one click, LightBeam can easily generate executive or delta reports to gain valuable insights into your sensitive data. Automate DSRs leveraging LightBeam's unique PII/PHI graphs comprehensively created from your data infrastructure. Build trust with your users by empowering them to exercise control over their data collection. Continuously monitor how sensitive data is collected, used, shared, and maintained with appropriate safeguards within your organization.
  • 12
    Salesforce Data Cloud
    Salesforce Data Cloud is a real-time data platform designed to unify and manage customer data from multiple sources across an organization, enabling a single, comprehensive view of each customer. It allows businesses to collect, harmonize, and analyze data in real time, creating a 360-degree customer profile that can be leveraged across Salesforce’s various applications, such as Marketing Cloud, Sales Cloud, and Service Cloud. This platform enables faster, more personalized customer interactions by integrating data from online and offline channels, including CRM data, transactional data, and third-party data sources. Salesforce Data Cloud also offers advanced AI gents and analytics capabilities, helping organizations gain deeper insights into customer behavior and predict future needs. By centralizing and refining data for actionable use, Salesforce Data Cloud supports enhanced customer experiences, targeted marketing, and efficient, data-driven decision-making across departments.
  • 13
    Azure Marketplace
    Azure Marketplace is a comprehensive online store that provides access to thousands of certified, ready-to-use software applications, services, and solutions from Microsoft and third-party vendors. It enables businesses to discover, purchase, and deploy software directly within the Azure cloud environment. The marketplace offers a wide range of products, including virtual machine images, AI and machine learning models, developer tools, security solutions, and industry-specific applications. With flexible pricing options like pay-as-you-go, free trials, and subscription models, Azure Marketplace simplifies the procurement process and centralizes billing through a single Azure invoice. It supports seamless integration with Azure services, enabling organizations to enhance their cloud infrastructure, streamline workflows, and accelerate digital transformation initiatives.
  • 14
    AWS DataSync
    AWS DataSync is a secure, online service that automates and accelerates moving data between on-premises storage and AWS Storage services. It simplifies migration planning and reduces expensive on-premises data movement costs with a fully managed service that seamlessly scales as data loads increase. DataSync can copy data between Network File System (NFS) shares, Server Message Block (SMB) shares, Hadoop Distributed File Systems (HDFS), self-managed object storage, AWS Snowcone, Amazon Simple Storage Service (Amazon S3) buckets, Amazon Elastic File System (Amazon EFS) file systems, Amazon FSx for Windows File Server file systems, Amazon FSx for Lustre file systems, Amazon FSx for OpenZFS file systems, and Amazon FSx for NetApp ONTAP file systems. It also supports moving data between other public clouds and AWS Storage services, enabling replication, archival, or sharing of application data easily. DataSync provides end-to-end security, including data encryption and data integrity.
  • 15
    MLlib

    MLlib

    Apache Software Foundation

    ​Apache Spark's MLlib is a scalable machine learning library that integrates seamlessly with Spark's APIs, supporting Java, Scala, Python, and R. It offers a comprehensive suite of algorithms and utilities, including classification, regression, clustering, collaborative filtering, and tools for constructing machine learning pipelines. MLlib's high-quality algorithms leverage Spark's iterative computation capabilities, delivering performance up to 100 times faster than traditional MapReduce implementations. It is designed to operate across diverse environments, running on Hadoop, Apache Mesos, Kubernetes, standalone clusters, or in the cloud, and accessing various data sources such as HDFS, HBase, and local files. This flexibility makes MLlib a robust solution for scalable and efficient machine learning tasks within the Apache Spark ecosystem. ​
  • 16
    PacketRanger
    PacketRanger is a web-based SaaS platform that effortlessly builds and manages telemetry pipelines across the entire IT landscape by inspecting, filtering, replicating, and routing data from any source to an unlimited number of destination consumers. It enables rapid construction of pipelines that eliminate noise, establishes volumetric baselines with customizable threshold notifications, and provides rich visualizations to pinpoint low- and high-value data as well as network issues and misconfigurations. Designed for NetFlow, it moderates congestion, optimizes flow-based licensing, reduces duplicate UDP datagrams, supports all NetFlow/IPFIX versions, offers over 400 predefined and custom filter templates, mitigates packet loss, and overcomes exporter limitations. For Syslog, it ensures balanced event distribution, simple keyword and regular-expression filtering, TCP/TLS support, automatic message parsing without manual grok patterns, and the ability to transform logs into SNMP traps.
  • 17
    FICO Xpress Optimization
    The most complete set of optimization software technology and tools. Solving large complex optimization problems can be the difference between success and failure in today's marketplace. FICO Xpress Optimization allows businesses to solve their toughest problems, faster. FICO’s deep portfolio of optimization options enables users to easily build, deploy and use optimization solutions that meet their needs. Standard capabilities include scalable high-performance solvers and algorithms, flexible modeling environments, rapid application development, comparative scenario analysis and reporting capabilities, for on-premises and cloud installations. Millions of variables can be processed at great speed and scalability, enabling business users to find better decisions to complex problems in minutes. With a rich feature set of advanced tools, FICO empowers business users to make faster, smarter, customer-focused decisions.
  • 18
    Unravel

    Unravel

    Unravel Data

    Unravel makes data work anywhere: on Azure, AWS, GCP or in your own data center– Optimizing performance, automating troubleshooting and keeping costs in check. Unravel helps you monitor, manage, and improve your data pipelines in the cloud and on-premises – to drive more reliable performance in the applications that power your business. Get a unified view of your entire data stack. Unravel collects performance data from every platform, system, and application on any cloud then uses agentless technologies and machine learning to model your data pipelines from end to end. Explore, correlate, and analyze everything in your modern data and cloud environment. Unravel’s data model reveals dependencies, issues, and opportunities, how apps and resources are being used, what’s working and what’s not. Don’t just monitor performance – quickly troubleshoot and rapidly remediate issues. Leverage AI-powered recommendations to automate performance improvements, lower costs, and prepare.
  • 19
    HyperCube

    HyperCube

    BearingPoint

    Whatever your business need, discover hidden insights quickly and easily using HyperCube, the platform designed for the way data scientists work. Put your business data to work. Unlock understanding, discover unrealized opportunities, generate predictions and avoid risks before they happen. HyperCube takes huge volumes of data and turns it into actionable insights. Whether a beginner in analytics or a machine learning expert, HyperCube is designed with you in mind. It is the Swiss Army knife of data science, combining proprietary and open source code to deliver a wide range of data analysis features straight out of the box or as business apps, customized just for you. We are constantly updating and perfecting our technology so we can deliver the most innovative, intuitive and adaptable results Choose from apps, data-as-a-services (DaaS) and vertical market solutions.
  • 20
    Talend Data Fabric
    Talend Data Fabric’s suite of cloud services efficiently handles all your integration and integrity challenges — on-premises or in the cloud, any source, any endpoint. Deliver trusted data at the moment you need it — for every user, every time. Ingest and integrate data, applications, files, events and APIs from any source or endpoint to any location, on-premise and in the cloud, easier and faster with an intuitive interface and no coding. Embed quality into data management and guarantee ironclad regulatory compliance with a thoroughly collaborative, pervasive and cohesive approach to data governance. Make the most informed decisions based on high quality, trustworthy data derived from batch and real-time processing and bolstered with market-leading data cleaning and enrichment tools. Get more value from your data by making it available internally and externally. Extensive self-service capabilities make building APIs easy— improve customer engagement.
  • 21
    NFVgrid

    NFVgrid

    InterCloud Systems

    NFVgrid provides automated provisioning, analytics, monitoring, and life-cycle management for Virtual Network Function appliances managed through a single system. The NFVgrid web portal delivers a smooth user experience. The dashboard neatly presents all of the virtual appliances and services that a customer can roll-out or terminate. NFVgrid automatically deploys virtual appliances with pre-provisioned settings and connects them to networks of choice. For advanced settings, virtual network appliances can be accessed later by web portal or CLI. No system in today’s network can perform in isolation, so we built NFVgrid with a rich set of RESTful APIs for easy integration with OSS and BSS systems, including billing. NFVgrid provides performance monitoring functions as well as a meaningful representation of the different kinds of analytical data for the traffic passing through the network or a particular VM.
  • 22
    SnapLogic

    SnapLogic

    SnapLogic

    Quickly ramp up, learn and use SnapLogic to create, multi-point, enterprise- wide app and data integrations. Easily expose and manage pipeline APIs that extend your world. Eliminate slower, manual, error-prone methods and deliver faster results for business processes such as customer onboarding, employee onboarding and off-boarding, quote to cash, ERP SKU forecasting, support ticket creation, and more. Monitor, manage, secure, and govern your data pipelines, application integrations, and API calls––all from a single pane of glass. Launch automated workflows for any department, across your enterprise, in minutes – not days. To deliver superior employee experiences, the SnapLogic platform can bring together employee data across all your enterprise HR apps and data stores. Learn how SnapLogic can help you quickly set up seamless experiences powered by automated processes.
  • 23
    matchit

    matchit

    360Science

    The foundation of our matching software, matchit® is designed specifically to deliver results that mirror human-like perception, at scale and without preprocessing. Using Artificial Intelligence, a proprietary phonetic algorithm, lexicons, and a contextual scoring engine, matchit defeats the errors, inconsistencies, and challenges commonly found in contact and business data. Conventional matching solutions require a user to define matching logic, which is a combination of functions and off-the-shelf fuzzy algorithms, used to produce an alphanumeric value. This alphanumeric value, or ‘match key’, forms the basis for comparing two records together and ultimately finding matches. Unlike conventional matching solutions, matchit doesn’t rely on a single comparison between match keys to find a match. Instead, matchit evaluates records contextually, running a variety of comparisons and scoring them individually to grade similarity between all the relevant elements that make up your data.
  • 24
    Proficio

    Proficio

    Proficio

    Proficio’s Managed, Detection and Response (MDR) solution surpasses the capabilities of traditional Managed Security Services Providers (MSSPs). Our MDR service is powered by next-generation cybersecurity technology and our security experts partner with you to become an extension of your team, continuously monitoring and investigating threats from our global networks of security operations centers. Proficio’s advanced approach to threat detection leverages an extensive library of security use cases, MITRE ATT&CK® framework, AI-based threat hunting models, business context modeling, and a threat intelligence platform. Through our global network of Security Operations Centers (SOCs), Proficio experts monitor, investigate and triage suspicious events. We significantly reduce the number of false positives and provide actionable alerts with remediation recommendations. Proficio is a leader in Security Orchestration Automation and Response (SOAR).
  • 25
    Apache Flink

    Apache Flink

    Apache Software Foundation

    Apache Flink is a framework and distributed processing engine for stateful computations over unbounded and bounded data streams. Flink has been designed to run in all common cluster environments, perform computations at in-memory speed and at any scale. Any kind of data is produced as a stream of events. Credit card transactions, sensor measurements, machine logs, or user interactions on a website or mobile application, all of these data are generated as a stream. Apache Flink excels at processing unbounded and bounded data sets. Precise control of time and state enable Flink’s runtime to run any kind of application on unbounded streams. Bounded streams are internally processed by algorithms and data structures that are specifically designed for fixed sized data sets, yielding excellent performance. Flink is designed to work well each of the previously listed resource managers.
  • 26
    Enterprise Recon

    Enterprise Recon

    Ground Labs

    With Enterprise Recon by Ground Labs, organizations can find and remediate sensitive information across the broadest range of structured and unstructured data, whether it’s stored on your servers, on your employees’ devices, or in the cloud. Enterprise Recon enables organizations worldwide to seamlessly discover all of their data and comply with GDPR, PCI DSS, CCPA, HIPAA, Australian Privacy and other data security standards that require the ability to locate and secure PII data as well as information on gender, ethnicity and health… or even non-PII financial data. Enterprise Recon is powered by GLASS™, Ground Labs' proprietary technology that enables the quickest and most accurate data discovery across the broadest set of platforms available. Enterprise Recon natively supports sensitive data discovery on Windows, macOS, Linux, FreeBSD, Solaris, HP-UX and IBM AIX using agent and agentless options. Additional remote options also enable almost any network data stored.
  • 27
    ContextIQ
    Online consumers prefer recommendations that are relevant to their needs or interest. You can now give them an enhanced experience with behavior profiling and contextual targeting. Use our recommendation engine to offer a more focused personalization. Keep your visitors hooked with personalized content. Greater the time a user spends on a site, higher the chances of a conversion. Help shoppers find stuff buried deep within your eCommerce store. Increase sales through timely and intelligent product recommendations. Showcase products or content that interest the user. Only relevant suggestions capture user attention and lead to fruitful interactions. ContextIQ is an easy-to-deploy personalization solution that uses collaborative filtering algorithms to produce recommendations. It is capable of suggesting content to users through behavioral targeting.
  • 28
    CloudSwyft

    CloudSwyft

    CloudSwyft

    CloudSwyft has built one of the fastest growing end-to-end cloud-based technology learning platforms globally, focused on supporting the innovative delivery of modern 21st century technology skills training and credentialing to meet the demands of rapid digital transformation. We provide cloud-based learning platforms, customized hands-on labs, digital credentialing and an innovative blended learning experience product. We provide this technology to a broad range of higher learning institutions, governments and corporates across our home markets of Asia Pacific and the Middle East and to the world’s largest MOOC providers. With our technology content partners, Microsoft and UiPath, we have used this same technology to deliver premium online technology skills training to these same customers and direct to individual learners in partnership with a broad range of leading B2C platforms.
  • 29
    CYRES

    CYRES

    CYRES

    The best solution to guarantee a high level of security on all your equipment & data. Choose Exchange, the most complete and secure business messaging solution on the market. By relying on Cloudera, centralize, process and analyze your data within flexible Cloud platforms, in an industrialized and secure manner. Launch micro-services architectures with the Docker containerization platform and automate deployment to production with GitLab. Take advantage of our managed services to integrate the AWS or Azure cloud. Deploy your applications in the most efficient environments on the market. Use Veeam Cloud Connect to deploy your PRA/PCA or outsource your virtual machine backups. Your private cloud to respond with agility to the rapid evolution of your business. The cloud benchmark on which millions of companies are already relying to gain agility. A wide range of cloud solutions to create VMs in seconds.
  • 30
    SSIS Integration Toolkit
    Jump right to our product page to see our full range of data integration software, including solutions for SharePoint and Active Directory. With over 300 individual data integration tools for connectivity and productivity, our data integration solutions allow developers to take advantage of the flexibility and power of the SSIS ETL engine to integrate virtually any application or data source. You don't have to write a single line of code to make data integration happen so your development can be done in a matter of minutes. We make the most flexible integration solution on the market. Our software offers intuitive user interfaces that are flexible and easy to use. With a streamlined development experience and an extremely simple licensing model, our solution offers the best value for your investment. Our software offers many specifically designed features that help you achieve the best possible performance without having to hijack your budget.