23 Integrations with Apache Flink

View a list of Apache Flink integrations and software that integrates with Apache Flink below. Compare the best Apache Flink integrations as well as features, ratings, user reviews, and pricing of software that integrates with Apache Flink. Here are the current Apache Flink integrations in 2024:

  • 1
    StarTree

    StarTree

    StarTree

    StarTree Cloud is a fully-managed real-time analytics platform designed for OLAP at massive speed and scale for user-facing applications. Powered by Apache Pinot, StarTree Cloud provides enterprise-grade reliability and advanced capabilities such as tiered storage, scalable upserts, plus additional indexes and connectors. It integrates seamlessly with transactional databases and event streaming platforms, ingesting data at millions of events per second and indexing it for lightning-fast query responses. StarTree Cloud is available on your favorite public cloud or for private SaaS deployment. • Gain critical real-time insights to run your business • Seamlessly integrate data streaming and batch data • High performance in throughput and low-latency at petabyte scale • Fully-managed cloud service • Tiered storage to optimize cloud performance & spend • Fully-secure & enterprise-ready
    View Software
    Visit Website
  • 2
    Netdata

    Netdata

    Netdata, Inc.

    The open-source observability platform everyone needs! Netdata collects metrics per second and presents them in beautiful low-latency dashboards. It is designed to run on all of your physical and virtual servers, cloud deployments, Kubernetes clusters, and edge/IoT devices, to monitor your systems, containers, and applications. It scales nicely from just a single server to thousands of servers, even in complex multi/mixed/hybrid cloud environments, and given enough disk space it can keep your metrics for years. KEY FEATURES: 💥 Collects metrics from 800+ integrations 💪 Real-Time, Low-Latency, High-Resolution 😶‍🌫️ Unsupervised Anomaly Detection 🔥 Powerful Visualization 🔔 Out of box Alerts 📖 systemd Journal Logs Explorer 😎 Low Maintenance ⭐ Open and Extensible Try Netdata today and feel the pulse of your infrastructure, with high-resolution metrics, journal logs and real-time visualizations.
    Leader badge
    Starting Price: Free
  • 3
    Scalytics Connect
    Scalytics Connect enables AI and ML to process and analyze data, makes it easier and more secure to use different data processing platforms at the same time. Built by the inventors of Apache Wayang, Scalytics Connect is the most enhanced data management platform, reducing the complexity of ETL data pipelines dramatically. Scalytics Connect is a data management and ETL platform that helps organizations unlock the power of their data, regardless of where it resides. It empowers businesses to break down data silos, simplify access, and gain valuable insights through a variety of features, including: - AI-powered ETL: Automates tasks like data extraction, transformation, and loading, freeing up your resources for more strategic work. - Unified Data Landscape: Breaks down data silos and provides a holistic view of all your data, regardless of its location or format. - Effortless Scaling: Handles growing data volumes with ease, so you never get bottlenecked by information overload
    Starting Price: $0
  • 4
    Kubernetes

    Kubernetes

    Kubernetes

    Kubernetes (K8s) is an open-source system for automating deployment, scaling, and management of containerized applications. It groups containers that make up an application into logical units for easy management and discovery. Kubernetes builds upon 15 years of experience of running production workloads at Google, combined with best-of-breed ideas and practices from the community. Designed on the same principles that allows Google to run billions of containers a week, Kubernetes can scale without increasing your ops team. Whether testing locally or running a global enterprise, Kubernetes flexibility grows with you to deliver your applications consistently and easily no matter how complex your need is. Kubernetes is open source giving you the freedom to take advantage of on-premises, hybrid, or public cloud infrastructure, letting you effortlessly move workloads to where it matters to you.
    Starting Price: Free
  • 5
    Apache Iceberg

    Apache Iceberg

    Apache Software Foundation

    Iceberg is a high-performance format for huge analytic tables. Iceberg brings the reliability and simplicity of SQL tables to big data, while making it possible for engines like Spark, Trino, Flink, Presto, Hive and Impala to safely work with the same tables, at the same time. Iceberg supports flexible SQL commands to merge new data, update existing rows, and perform targeted deletes. Iceberg can eagerly rewrite data files for read performance, or it can use delete deltas for faster updates. Iceberg handles the tedious and error-prone task of producing partition values for rows in a table and skips unnecessary partitions and files automatically. No extra filters are needed for fast queries, and the table layout can be updated as data or queries change.
    Starting Price: Free
  • 6
    Warp 10
    Warp 10 is a modular open source platform that collects, stores, and analyzes data from sensors. Shaped for the IoT with a flexible data model, Warp 10 provides a unique and powerful framework to simplify your processes from data collection to analysis and visualization, with the support of geolocated data in its core model (called Geo Time Series). Warp 10 is both a time series database and a powerful analytics environment, allowing you to make: statistics, extraction of characteristics for training models, filtering and cleaning of data, detection of patterns and anomalies, synchronization or even forecasts. The analysis environment can be implemented within a large ecosystem of software components such as Spark, Kafka Streams, Hadoop, Jupyter, Zeppelin and many more. It can also access data stored in many existing solutions, relational or NoSQL databases, search engines and S3 type object storage system.
  • 7
    Ververica

    Ververica

    Ververica

    Ververica Platform enables every enterprise to take advantage and derive immediate insight from its data in real-time. Powered by Apache Flink's robust streaming runtime, Ververica Platform makes this possible by providing an integrated solution for stateful stream processing and streaming analytics at scale. Powered by Apache Flink, Ververica Platform provides high throughput, low latency data processing, powerful abstractions and the operational flexibility trusted by some of the world’s largest and most successful data-driven enterprises such as Alibaba, Netflix and Uber. Ververica Platform brings the accumulated knowledge of our experience working with some of these large and innovative, data-driven companies into an easily-accessible, cost-effective and secure enterprise-ready platform.
  • 8
    DeltaStream

    DeltaStream

    DeltaStream

    DeltaStream is a unified serverless stream processing platform that integrates with streaming storage services. Think about it as the compute layer on top of your streaming storage. It provides functionalities of streaming analytics(Stream processing) and streaming databases along with additional features to provide a complete platform to manage, process, secure and share streaming data. DeltaStream provides a SQL based interface where you can easily create stream processing applications such as streaming pipelines, materialized views, microservices and many more. It has a pluggable processing engine and currently uses Apache Flink as its primary stream processing engine. DeltaStream is more than just a query processing layer on top of Kafka or Kinesis. It brings relational database concepts to the data streaming world, including namespacing and role based access control enabling you to securely access, process and share your streaming data regardless of where they are stored.
  • 9
    Apache Doris

    Apache Doris

    The Apache Software Foundation

    Apache Doris is a modern data warehouse for real-time analytics. It delivers lightning-fast analytics on real-time data at scale. Push-based micro-batch and pull-based streaming data ingestion within a second. Storage engine with real-time upsert, append and pre-aggregation. Optimize for high-concurrency and high-throughput queries with columnar storage engine, MPP architecture, cost based query optimizer, vectorized execution engine. Federated querying of data lakes such as Hive, Iceberg and Hudi, and databases such as MySQL and PostgreSQL. Compound data types such as Array, Map and JSON. Variant data type to support auto data type inference of JSON data. NGram bloomfilter and inverted index for text searches. Distributed design for linear scalability. Workload isolation and tiered storage for efficient resource management. Supports shared-nothing clusters as well as separation of storage and compute.
    Starting Price: Free
  • 10
    Hue

    Hue

    Hue

    Hue brings the best querying experience with the most intelligent autocomplete and query editor components. The tables and storage browsers leverage your existing data catalog knowledge transparently. Help users find the correct data among thousands of databases and self-document it. Assist users with their SQL queries and leverage rich previews for links, sharing from the editor directly in Slack. Several apps, each one specialized in a certain type of querying are available. Data sources can be explored first via the browsers. The editor shines for SQL queries. It comes with an intelligent autocomplete, risk alerts, and self-service troubleshooting. Dashboards focus on visualizing indexed data but can also query SQL databases. You can now search for certain cell values in the table and the results are highlighted. To make your SQL editing experience, Hue comes with one of the best SQL autocomplete on the planet.
    Starting Price: Free
  • 11
    Apache Mesos

    Apache Mesos

    Apache Software Foundation

    Mesos is built using the same principles as the Linux kernel, only at a different level of abstraction. The Mesos kernel runs on every machine and provides applications (e.g., Hadoop, Spark, Kafka, Elasticsearch) with API’s for resource management and scheduling across entire datacenter and cloud environments. Native support for launching containers with Docker and AppC images.Support for running cloud native and legacy applications in the same cluster with pluggable scheduling policies. HTTP APIs for developing new distributed applications, for operating the cluster, and for monitoring. Built-in Web UI for viewing cluster state and navigating container sandboxes.
  • 12
    E-MapReduce
    EMR is an all-in-one enterprise-ready big data platform that provides cluster, job, and data management services based on open-source ecosystems, such as Hadoop, Spark, Kafka, Flink, and Storm. Alibaba Cloud Elastic MapReduce (EMR) is a big data processing solution that runs on the Alibaba Cloud platform. EMR is built on Alibaba Cloud ECS instances and is based on open-source Apache Hadoop and Apache Spark. EMR allows you to use the Hadoop and Spark ecosystem components, such as Apache Hive, Apache Kafka, Flink, Druid, and TensorFlow, to analyze and process data. You can use EMR to process data stored on different Alibaba Cloud data storage service, such as Object Storage Service (OSS), Log Service (SLS), and Relational Database Service (RDS). You can quickly create clusters without the need to configure hardware and software. All maintenance operations are completed on its Web interface.
  • 13
    Foundational

    Foundational

    Foundational

    Identify code and optimization issues in real-time, prevent data incidents pre-deploy, and govern data-impacting code changes end to end—from the operational database to the user-facing dashboard. Automated, column-level data lineage, from the operational database all the way to the reporting layer, ensures every dependency is analyzed. Foundational automates data contract enforcement by analyzing every repository from upstream to downstream, directly from source code. Use Foundational to proactively identify code and data issues, find and prevent issues, and create controls and guardrails. Foundational can be set up in minutes with no code changes required.
  • 14
    Deep.BI

    Deep.BI

    Deep BI

    Deep.BI enables Media, Insurance, E-commerce and Banking enterprises to effectively increase revenues by anticipating specific user behaviors then automating actions to convert these users to paying customers and retaining them. Predictive customer data platform with real-time user scoring, based on Deep.BI's next-gen, enterprise data warehouse. We help digital businesses and platforms improve their products, content and distribution. Deep.BI's platform collects extensive data about product usage and content consumption and provides real-time, actionable insights. Real-time, actionable insights are generated within seconds through the Deep.Conveyor data pipeline, available for analysis in the Deep.Explorer business intelligence platform, augmented through the Deep.Score event scoring engine built with custom AI algorithms for your use case, and are ready for automation using the Deep.Conductor high-speed API and AI model serving platform.
  • 15
    Hadoop

    Hadoop

    Apache Software Foundation

    The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures. A wide variety of companies and organizations use Hadoop for both research and production. Users are encouraged to add themselves to the Hadoop PoweredBy wiki page. Apache Hadoop 3.3.4 incorporates a number of significant enhancements over the previous major release line (hadoop-3.2).
  • 16
    Alibaba Log Service
    Log Service is a complete real-time data logging service that has been developed by Alibaba Group. Log Service supports collection, consumption, shipping, search, and analysis of logs, and improves the capacity of processing and analyzing large amounts of logs. Completes data collections from more than 30 data sources within five minutes. Deploys reliable high-availability service nodes in data centers around the world. Fully supports real-time and offline computing, and seamlessly connects to Alibaba Cloud software, open-source software, and commercial software. You can set the access permissions for individual rows so that the same report is displayed differently for each user role.
  • 17
    Apache Knox

    Apache Knox

    Apache Software Foundation

    The Knox API Gateway is designed as a reverse proxy with consideration for pluggability in the areas of policy enforcement, through providers and the backend services for which it proxies requests. Policy enforcement ranges from authentication/federation, authorization, audit, dispatch, hostmapping and content rewrite rules. Policy is enforced through a chain of providers that are defined within the topology deployment descriptor for each Apache Hadoop cluster gated by Knox. The cluster definition is also defined within the topology deployment descriptor and provides the Knox Gateway with the layout of the cluster for purposes of routing and translation between user facing URLs and cluster internals. Each Apache Hadoop cluster that is protected by Knox has its set of REST APIs represented by a single cluster specific application context path. This allows the Knox Gateway to both protect multiple clusters and present the REST API consumer with a single endpoint.
  • 18
    lakeFS

    lakeFS

    Treeverse

    lakeFS enables you to manage your data lake the way you manage your code. Run parallel pipelines for experimentation and CI/CD for your data. Simplifying the lives of engineers, data scientists and analysts who are transforming the world with data. lakeFS is an open source platform that delivers resilience and manageability to object-storage based data lakes. With lakeFS you can build repeatable, atomic and versioned data lake operations, from complex ETL jobs to data science and analytics. lakeFS supports AWS S3, Azure Blob Storage and Google Cloud Storage (GCS) as its underlying storage service. It is API compatible with S3 and works seamlessly with all modern data frameworks such as Spark, Hive, AWS Athena, Presto, etc. lakeFS provides a Git-like branching and committing model that scales to exabytes of data by utilizing S3, GCS, or Azure Blob for storage.
  • 19
    Apache Zeppelin
    Web-based notebook that enables data-driven, interactive data analytics and collaborative documents with SQL, Scala and more. IPython interpreter provides comparable user experience like Jupyter Notebook. This release includes Note level dynamic form, note revision comparator and ability to run paragraph sequentially, instead of simultaneous paragraph execution in previous releases. Interpreter lifecycle manager automatically terminate interpreter process on idle timeout. So resources are released when they're not in use.
  • 20
    Apache Kudu

    Apache Kudu

    The Apache Software Foundation

    A Kudu cluster stores tables that look just like tables you're used to from relational (SQL) databases. A table can be as simple as a binary key and value, or as complex as a few hundred different strongly-typed attributes. Just like SQL, every table has a primary key made up of one or more columns. This might be a single column like a unique user identifier, or a compound key such as a (host, metric, timestamp) tuple for a machine time-series database. Rows can be efficiently read, updated, or deleted by their primary key. Kudu's simple data model makes it a breeze to port legacy applications or build new ones, no need to worry about how to encode your data into binary blobs or make sense of a huge database full of hard-to-interpret JSON. Tables are self-describing, so you can use standard tools like SQL engines or Spark to analyze your data. Kudu's APIs are designed to be easy to use.
  • 21
    Apache Hudi

    Apache Hudi

    Apache Corporation

    Hudi is a rich platform to build streaming data lakes with incremental data pipelines on a self-managing database layer, while being optimized for lake engines and regular batch processing. Hudi maintains a timeline of all actions performed on the table at different instants of time that helps provide instantaneous views of the table, while also efficiently supporting retrieval of data in the order of arrival. A Hudi instant consists of the following components. Hudi provides efficient upserts, by mapping a given hoodie key consistently to a file id, via an indexing mechanism. This mapping between record key and file group/file id, never changes once the first version of a record has been written to a file. In short, the mapped file group contains all versions of a group of records.
  • 22
    VeloDB

    VeloDB

    VeloDB

    Powered by Apache Doris, VeloDB is a modern data warehouse for lightning-fast analytics on real-time data at scale. Push-based micro-batch and pull-based streaming data ingestion within seconds. Storage engine with real-time upsert、append and pre-aggregation. Unparalleled performance in both real-time data serving and interactive ad-hoc queries. Not just structured but also semi-structured data. Not just real-time analytics but also batch processing. Not just run queries against internal data but also work as a federate query engine to access external data lakes and databases. Distributed design to support linear scalability. Whether on-premise deployment or cloud service, separation or integration of storage and compute, resource usage can be flexibly and efficiently adjusted according to workload requirements. Built on and fully compatible with open source Apache Doris. Support MySQL protocol, functions, and SQL for easy integration with other data tools.
  • 23
    Arroyo

    Arroyo

    Arroyo

    Scale from zero to millions of events per second. Arroyo ships as a single, compact binary. Run locally on MacOS or Linux for development, and deploy to production with Docker or Kubernetes. Arroyo is a new kind of stream processing engine, built from the ground up to make real-time easier than batch. Arroyo was designed from the start so that anyone with SQL experience can build reliable, efficient, and correct streaming pipelines. Data scientists and engineers can build end-to-end real-time applications, models, and dashboards, without a separate team of streaming experts. Transform, filter, aggregate, and join data streams by writing SQL, with sub-second results. Your streaming pipelines shouldn't page someone just because Kubernetes decided to reschedule your pods. Arroyo is built to run in modern, elastic cloud environments, from simple container runtimes like Fargate to large, distributed deployments on the Kubernetes logo Kubernetes.
  • Previous
  • You're on page 1
  • Next