Apache Parquet Integrations

23 Integrations with Apache Parquet

View a list of Apache Parquet integrations and software that integrates with Apache Parquet below. Compare the best Apache Parquet integrations as well as features, ratings, user reviews, and pricing of software that integrates with Apache Parquet. Here are the current Apache Parquet integrations in 2024:

1

StarfishETL

StarfishETL

StarfishETL is an Integration Platform as a Service (iPaaS), and although “integration” is in the name, it’s capable of much more. An iPaaS lives in the cloud and can integrate different systems by using their APIs. This makes it adaptable beyond integration for migration, data governance, and data cleansing. Unlike traditional integration apps, StarfishETL provides low-code mapping and powerful scripting tools to manage, personalize, and manipulate data at scale. Features: - Drag and drop mapping - AI-powered connections - Purpose built integrations - Extensibility through scripting - Secure on-premises connections - Scalable data capacity

Starting Price: 400/month

View Software
2

Flyte

Union.ai

The workflow automation platform for complex, mission-critical data and ML processes at scale. Flyte makes it easy to create concurrent, scalable, and maintainable workflows for machine learning and data processing. Flyte is used in production at Lyft, Spotify, Freenome, and others. At Lyft, Flyte has been serving production model training and data processing for over four years, becoming the de-facto platform for teams like pricing, locations, ETA, mapping, autonomous, and more. In fact, Flyte manages over 10,000 unique workflows at Lyft, totaling over 1,000,000 executions every month, 20 million tasks, and 40 million containers. Flyte has been battle-tested at Lyft, Spotify, Freenome, and others. It is entirely open-source with an Apache 2.0 license under the Linux Foundation with a cross-industry overseeing committee. Configuring machine learning and data workflows can get complex and error-prone with YAML.

Starting Price: Free

View Software
3

PI.EXCHANGE

PI.EXCHANGE

Easily connect your data to the engine, either through uploading a file or connecting to a database. Then, start analyzing your data through visualizations, or prepare your data for machine learning modeling with the data wrangling actions with repeatable recipes. Get the most out of your data by building machine learning models, using regression, classification or clustering algorithms - all without any code. Uncover insights into your data, using the feature importance, prediction explanation, and what-if tools. Make predictions and integrate them seamlessly into your existing systems through our connectors, ready to go so you can start taking action.

Starting Price: $39 per month

View Software
4

Warp 10

SenX

Warp 10 is a modular open source platform that collects, stores, and analyzes data from sensors. Shaped for the IoT with a flexible data model, Warp 10 provides a unique and powerful framework to simplify your processes from data collection to analysis and visualization, with the support of geolocated data in its core model (called Geo Time Series). Warp 10 is both a time series database and a powerful analytics environment, allowing you to make: statistics, extraction of characteristics for training models, filtering and cleaning of data, detection of patterns and anomalies, synchronization or even forecasts. The analysis environment can be implemented within a large ecosystem of software components such as Spark, Kafka Streams, Hadoop, Jupyter, Zeppelin and many more. It can also access data stored in many existing solutions, relational or NoSQL databases, search engines and S3 type object storage system.

View Software
5

Indexima Data Hub

Indexima

Reshape your perception of time in data analytics. Instantly access your business’ data in no time and work directly on your dashboard without going back and forth with the IT team. Meet Indexima DataHub, a new space-time where operational and functional users gain instant access to their data, in no time. With a combination of its unique indexing engine and machine learning, Indexima allows businesses to access all their data to simplify and speed up analytics. Robust and scalable, the solution allows organizations to query all their data directly at the source, in volumes of tens of billions of rows in just a few milliseconds. Our Indexima platform allows users to implement instant analytics on all their data in just one click. Thanks to Indexima’s new ROI and TCO calculator, find out in 30 seconds the ROI of your data platform. Infrastructure costs, project deployment time, and data engineering costs, while boosting your analytical performances.

Starting Price: $3,290 per month

View Software
6

Tonic Ephemeral

Tonic

Stop wasting time provisioning and maintaining databases yourself. Effortlessly create isolated test databases to ship features faster. Equip your developers with the ready-to-go data they need to keep fast-paced projects on track. Spin up pre-populated databases for testing purposes as part of your CI/CD pipeline, and automatically tear them down once the tests are done. Quickly and painlessly spin up databases at the click of a button for testing, bug reproduction, demos, and more with built-in container orchestration. Use our patented subsetter to shrink PBs down to GBs without breaking referential integrity, then leverage Tonic Ephemeral to spin up a database with only the data needed for development to cut cloud costs and maximize efficiency. Pair our patented subsetted with Tonic Ephemeral to get all the data subsets you need for only as long as you need them. Maximize efficiency by getting your developers access to one-off datasets for local development.

Starting Price: $199 per month

View Software
7

PuppyGraph

PuppyGraph

PuppyGraph empowers you to seamlessly query one or multiple data stores as a unified graph model. Graph databases are expensive, take months to set up, and need a dedicated team. Traditional graph databases can take hours to run multi-hop queries and struggle beyond 100GB of data. A separate graph database complicates your architecture with brittle ETLs and inflates your total cost of ownership (TCO). Connect to any data source anywhere. Cross-cloud and cross-region graph analytics. No complex ETLs or data replication is required. PuppyGraph enables you to query your data as a graph by directly connecting to your data warehouses and lakes. This eliminates the need to build and maintain time-consuming ETL pipelines needed with a traditional graph database setup. No more waiting for data and failed ETL processes. PuppyGraph eradicates graph scalability issues by separating computation and storage.

Starting Price: Free

View Software
8

Timeplus

Timeplus

Timeplus is a simple, powerful, and cost-efficient stream processing platform. All in a single binary, easily deployed anywhere. We help data teams process streaming and historical data quickly and intuitively, in organizations of all sizes and industries. Lightweight, single binary, without dependencies. End-to-end analytic streaming and historical functionalities. 1/10 the cost of similar open source frameworks. Turn real-time market and transaction data into real-time insights. Leverage append-only streams and key-value streams to monitor financial data. Implement real-time feature pipelines using Timeplus. One platform for all infrastructure logs, metrics, and traces, the three pillars supporting observability. In Timeplus, we support a wide range of data sources in our web console UI. You can also push data via REST API, or create external streams without copying data into Timeplus.

Starting Price: $199 per month

View Software
9

iDiscover

Mage Data

Uncover hidden sensitive data locations within your enterprise through Mage's patented Sensitive Data Discovery module. Find data hidden in all types of data stores in the most obscure locations, be it structured, unstructured, Big Data, or on the Cloud. Leverage the power of Artificial Intelligence and Natural Language Processing to uncover data in the most complex of locations. Ensure efficient identification of sensitive data with minimal false positives with a patented approach to data discovery. Configure any additional data classifications over and above the 70+ out of the box data classifications covering all popular PII and PHI data. Schedule sample, full, or even incremental scans through a simplified discovery process.

View Software
10

Blotout

Blotout

Activate customer journeys with complete visibility using infrastructure-as-code. Blotout’s SDK offers companies all of the analytics and remarketing tools they are accustomed to, while offering best-in-class privacy preservation for the company’s users. Blotout’s SDK is out of the box compliant with GDPR, CCPA & COPPA. Blotout’s SDK uses on-device, distributed edge computing for analytics, messaging and remarketing, all without using user personal data, device IDs or IP addresses. Measure, attribute, optimize, and activate customer data with 100% customer coverage. The only stack that gives you the complete customer lifecycle by unifying event, online, and offline data sources. Establish a trusted data relationship with your customers to build loyalty and maintain compliance with the GDPR and global privacy laws.

View Software
11

Gravity Data

Gravity

Gravity's mission is to make streaming data easy from over 100 sources while only paying for what you use. Gravity removes the reliance on engineering teams to deliver streaming pipelines with a simple interface to get streaming up and running in minutes from databases, event data and APIs. Everyone in the data team can now build with simple point and click so that you can focus on building apps, services and customer experiences. Full Execution trace and detailed error messaging for quick diagnosis and resolution. We have implemented new, feature-rich ways for you to quickly get started. From bulk set-up, default schemas and data selection to different job modes and statuses. Spend less time wrangling with infrastructure and more time analysing data while allowing our intelligent engine to keep your pipelines running. Gravity integrates with your systems for notifications and orchestration.

View Software
12

Meltano

Meltano

Meltano provides the ultimate flexibility in deployment options. Own your data stack, end to end. Ever growing connector library of 300+ connectors have been running in production for years. Run workflows in isolated environments, execute end-to-end tests, and version control everything. Open source gives you the power to build your ideal data stack. Define your entire project as code and collaborate confidently with your team. The Meltano CLI enables you to rapidly create your project, making it easy to start replicating data. Meltano is designed to be the best way to run dbt to manage your transformations. Your entire data stack is defined in your project, making it simple to deploy it to production. Validate your changes in development before moving to CI, and in staging before moving to production.

View Software
13

Semarchy xDI

Semarchy

Experience Semarchy’s flexible unified data platform to empower better business decisions enterprise-wide. Integrate all your data with xDI, the high-performance, agile, and extensible data integration for all styles and use cases. Its single technology federates all forms of data integration, and mapping converts business rules into deployable code. xDI has extensible and open architecture supporting on-premise, cloud, hybrid, and multi-cloud environments.

View Software
14

Autymate

Autymate

Our one-time, no-code integrations work with 200+ of the world’s biggest platforms. From HR and payroll to managing customers and vendors, you can connect everyone with everything without lifting a finger. We made our interface so intuitive that it looks like you are doing the automation within QuickBooks itself. Seamlessly integrate QuickBooks and your accounting systems, eliminating data entry and boosting your team's productivity. Make accounting effortless for your franchise business. Stay ahead of your competition and make your customers stay longer with a white-labeled accounting automation app. Connect your enterprise's most complex systems in one easy workflow and automate all the busy work in between. Seamlessly integrate QuickBooks and your accounting systems, eliminating data entry and boosting your team's productivity. Your accountants can do what they love doing and work on more meaningful tasks that have a more significant impact.

View Software
15

Hadoop

Apache Software Foundation

The Apache Hadoop software library is a framework that allows for the distributed processing of large data sets across clusters of computers using simple programming models. It is designed to scale up from single servers to thousands of machines, each offering local computation and storage. Rather than rely on hardware to deliver high-availability, the library itself is designed to detect and handle failures at the application layer, so delivering a highly-available service on top of a cluster of computers, each of which may be prone to failures. A wide variety of companies and organizations use Hadoop for both research and production. Users are encouraged to add themselves to the Hadoop PoweredBy wiki page. Apache Hadoop 3.3.4 incorporates a number of significant enhancements over the previous major release line (hadoop-3.2).

View Software
16

IBM Db2 Event Store

IBM

IBM Db2 Event Store is a cloud-native database system that is designed to handle massive amounts of structured data that is stored in Apache Parquet format. Because it is optimized for event-driven data processing and analysis, this high-speed data store can capture, analyze, and store more than 250 billion events per day. The data store is flexible and scalable to adapt quickly to your changing business needs. With the Db2 Event Store service, you can create these data stores in your Cloud Pak for Data cluster so that you can govern the data and use it for more in-depth analysis. You need to rapidly ingest large amounts of streaming data (up to one million inserts per second per node) and use it for real-time analytics with integrated machine learning capabilities. Analyze incoming data from different medical devices in real time to provide better health outcomes for patients while providing cost savings for moving the data to storage.

View Software
17

SSIS Integration Toolkit

KingswaySoft

Jump right to our product page to see our full range of data integration software, including solutions for SharePoint and Active Directory. With over 300 individual data integration tools for connectivity and productivity, our data integration solutions allow developers to take advantage of the flexibility and power of the SSIS ETL engine to integrate virtually any application or data source. You don't have to write a single line of code to make data integration happen so your development can be done in a matter of minutes. We make the most flexible integration solution on the market. Our software offers intuitive user interfaces that are flexible and easy to use. With a streamlined development experience and an extremely simple licensing model, our solution offers the best value for your investment. Our software offers many specifically designed features that help you achieve the best possible performance without having to hijack your budget.

View Software
18

Amazon SageMaker Data Wrangler

Amazon

Amazon SageMaker Data Wrangler reduces the time it takes to aggregate and prepare data for machine learning (ML) from weeks to minutes. With SageMaker Data Wrangler, you can simplify the process of data preparation and feature engineering, and complete each step of the data preparation workflow (including data selection, cleansing, exploration, visualization, and processing at scale) from a single visual interface. You can use SQL to select the data you want from a wide variety of data sources and import it quickly. Next, you can use the Data Quality and Insights report to automatically verify data quality and detect anomalies, such as duplicate rows and target leakage. SageMaker Data Wrangler contains over 300 built-in data transformations so you can quickly transform data without writing any code. Once you have completed your data preparation workflow, you can scale it to your full datasets using SageMaker data processing jobs; train, tune, and deploy models.

View Software
19

APERIO DataWise

APERIO

Data is used in every aspect of a processing plant or facility, it is underlying most operational processes, most business decisions, and most environmental events. Failures are often attributed to this same data, in terms of operator error, bad sensors, safety or environmental events, or poor analytics. This is where APERIO can alleviate these problems. Data integrity is a key element of Industry 4.0; the foundation upon which more advanced applications, such as predictive models, process optimization, and custom AI tools are developed. APERIO DataWise is the industry-leading provider of reliable, trusted data. Automate the quality of your PI data or digital twins continuously and at scale. Ensure validated data across the enterprise to improve asset reliability. Empower the operator to make better decisions. Detect threats made to operational data to ensure operational resilience. Accurately monitor & report sustainability metrics.

View Software
20

3LC

3LC

Light up the black box and pip install 3LC to gain the clarity you need to make meaningful changes to your models in moments. Remove the guesswork from your model training and iterate fast. Collect per-sample metrics and visualize them in your browser. Analyze your training and eliminate issues in your dataset. Model-guided, interactive data debugging and enhancements. Find important or inefficient samples. Understand what samples work and where your model struggles. Improve your model in different ways by weighting your data. Make sparse, non-destructive edits to individual samples or in a batch. Maintain a lineage of all changes and restore any previous revisions. Dive deeper than standard experiment trackers with per-sample per epoch metrics and data tracking. Aggregate metrics by sample features, rather than just epoch, to spot hidden trends. Tie each training run to a specific dataset revision for full reproducibility.

View Software
21

Arroyo

Arroyo

Scale from zero to millions of events per second. Arroyo ships as a single, compact binary. Run locally on MacOS or Linux for development, and deploy to production with Docker or Kubernetes. Arroyo is a new kind of stream processing engine, built from the ground up to make real-time easier than batch. Arroyo was designed from the start so that anyone with SQL experience can build reliable, efficient, and correct streaming pipelines. Data scientists and engineers can build end-to-end real-time applications, models, and dashboards, without a separate team of streaming experts. Transform, filter, aggregate, and join data streams by writing SQL, with sub-second results. Your streaming pipelines shouldn't page someone just because Kubernetes decided to reschedule your pods. Arroyo is built to run in modern, elastic cloud environments, from simple container runtimes like Fargate to large, distributed deployments on the Kubernetes logo Kubernetes.

View Software
22

e6data

e6data

Limited competition due to deep barriers to entry, specialized know-how, massive capital needs, and long time-to-market. Existing platforms are indistinguishable in price, and performance reducing the incentive to switch. Migrating from one engine’s SQL dialect to another engine’s SQL involves months of effort. Truly format-neutral computing, interoperable with all major open standards. Enterprise data leaders are hit by an unprecedented explosion in computing demand for data intelligence. They are surprised to find that 10% of their heavy, compute-intensive use cases consume 80% of the cost, engineering effort and stakeholder complaints. Unfortunately, such workloads are also mission-critical and non-discretionary. e6data amplifies ROI on enterprises' existing data platforms and architecture. e6data’s truly format-neutral compute has the unique distinction of being equally efficient and performant across leading data lakehouse table formats.

View Software
23

Mage Platform

Mage Data

Mage Data™ is the leading solutions provider of data security and data privacy software for global enterprises. Built upon a patented and award-winning solution, the Mage platform enables organizations to stay on top of privacy regulations while ensuring security and privacy of data. Top Swiss Banks, Fortune 10 organizations, Ivy League Universities, and Industry Leaders in the financial and healthcare businesses protect their sensitive data with the Mage platform for Data Privacy and Security. Deploying state-of-the-art privacy enhancing technologies for securing data, Mage Data™ delivers robust data security while ensuring privacy of individuals. Visit the website to explore the company’s solutions.

View Software