overview of global network connectivity

MapR Technologies Bring Converged Data Platform for Data-Sensitive Applications

By Community Team

Today’s companies are entirely data-driven, and there are no signs that this data revolution will be slowing down anytime soon. From mobile devices to personal laptops and the rapid growth of the Internet of Things (IoT), data is touching every aspect of life and business throughout the world. The staggering figures released in the IDC Digital Universe Study: Big Data, Bigger Digital Shadows and Biggest Growth in the Far East by EMC forecasted that by the year 2020, at least 1.7 megabytes of new information will be created per second by every human being on earth. The study also projected that about 4.4 zettabytes (a zettabyte is 2 to the 70th power bytes) will be accumulated in the digital universe at that time.

In this article, Jack Norris, Senior Vice President for Data and Applications at MapR Technologies, discusses the significant role of Big Data in today’s business processes along with the nature and big business benefits of using a converged architecture.

The World (and Impact) of Big Data

Data affects everyone’s personal decisions…but businesses are not spared from this data-driven world, and that’s because their processes are radically being changed by data processing, data analysis, and data management. The same ECM study mentioned above estimated that by 2020, at least a third of all data will pass through a network of servers connected to the Internet (more commonly known as the cloud). In fact, for a Fortune 1000 company, a mere 10% increase in data accessibility can translate to $65 million additional net income. Even the White House realizes the potential of big data and has invested $200 million in big data projects over time. But perhaps the most shocking fact in the study is that less than 0.5% of data was analyzed and then used. It goes to show that the sheer amount of data (and the concept of real-time data) can be tough to digest and assess.

The idea of “big data” is a reference to the large volume of structured and unstructured data that floods businesses throughout their day-to-day operations. Each and every day, the amount of data created globally is growing at an exponential rate. However, it is not how much data is available to consume, but it is more about what businesses actually do with this data that actually matters. This high volume of diverse data, if properly analyzed and utilized, can help businesses transform customer experiences, detect fraudulent behavior, reduce costs, and determine causes of failures or issues.

The concept of data processing and management, according to a Harvard Business Review article, is considered by Fortune 1000 companies to be a worthwhile investment. The article revealed that 48.4% of the firms surveyed claimed they were achieving measurable results from their big data investments. The companies also reported that they found the most value in their big data investment to help them decrease expenses, find new innovation avenues, launch new products and services, add revenue, increase the speed of current efforts, transform business for the future, and establish a data-driven culture in the workplace. Obviously, the benefits of big data processing and management will only be maximized if firms partner with reputable enterprise software companies that can help in the storage and organization of their data.

The Importance of a Modern Data Fabric

As a leading provider of data solutions, MapR recognizes that today’s companies have to adapt to new technologies as the role of applications is slowly dictating how data is stored and organized, thus changing business workflows and processes. “The new paradigm is that new technology has freed up businesses from the traditional constraints of commodity hardware,” said Norris. “Cloud infrastructure brought along a new era of how you architect networks, and companies need a new way to approach this,” he added.

jack norris of mapr technologies

Jack Norris, the Senior Vice President of Data and Applications at MapR Technologies

MapR’s data fabric is architected to focus on adding scale, speed, and data reliability to the core of their solution. “We have moved to this ubiquitous space where data can’t tolerate or function with just a single central point for metadata…instead, you need a global data fabric that can stretch across locations and can handle any type of data (including data at motion versus data at rest),” noted Norris.

Beyond the technology and machine learning that has helped drive modern data systems, there is a constant struggle among companies when it comes to embracing new technologies without the exorbitant costs tied to it. Most IT budgets are projected to remain flat in the coming years, and the struggle to drive innovation is often done by working with a limited budget. “We offer a data fabric that can reduce the costs of running an existing legacy system while allowing the company to pursue innovation, which is a fairly high bar we have established for organizations,” Norris shared. “This is a rather important development considering that 90% of success is tied to data analytics according to the recent O’Reilly book “Machine Learning Logistics”.

MapR’s Converged Data Platform saves companies a great deal of stress when migrating to the cloud because developers and administrators can take advantage of their converged data platform to simplify standard problems encountered in the cloud and during cloud migration. This shifting of applications to a common data level, for MapR, is the ultimate enabler of business agility.

Using Cloud Resources to Control and Focus on Assets

When it comes to data processing, we can divide it into three different types: real-time, near-real-time, and batch processing. Ultimately, the manner of processing and the tools needed to process the data are largely dependent on the purpose of the user. Real-time data processing involves continual, constant, and steady input and output of data (ie. bank transactions, where banking data should be processed swiftly to reflect balances accurately). Near-real-time data processing, on the other hand, also places great importance on speed but is more forgiving when compared to real-time processing. For this type of data processing, the time allowance for processing the data can range from minutes to days or weeks. And lastly, big data batch processing entails processing high volumes of data collected over a period of time. Billing and payroll systems are good examples of this, as the cycle usually occurs within set time periods.

Regardless of the type, processing data is not an easy task, and thus, businesses should take advantage of data platforms to help achieve true flexibility.

“We are looking at taking advantage of cloud resources in a way that we can still control and focus on assets with flexibility in mind, and then further understand where you can operate and execute, which can be dictated to help optimize cost or performance,” Norris shared. He believes that the real power for business transformation relies on analytics being injected during business operations, so that in this sense, analytics are not just data streams but are actually points that can be used to help companies operate and adjust in real-time. “For example, by having deep historical data at your disposal, it is easier to recognize anomalies in newly arriving data,” he added.analytics gathered using a cloud network

Companies are especially concerned when it comes to data lock-in, the idea that you can get data in but have trouble getting data out, thus “locking” yourself in with your cloud provider. Companies want to take advantage of the flexibility of cloud infrastructures while still protecting their data assets. By offering a solution that enables data to flow smoothly from on-premise and across cloud providers, MapR Technologies attracts organizations because of direct benefits like cost reduction, better performance, availability, and government regulations compliance.

MapR’s solution also gives developers a general purpose database. After a re-architecture to eliminate the brittleness of HBase while still maintaining the API to take advantage of open source ecosystem, the latest update includes multi-model database that supports analytics and operations. The architecture also supports a host of standard APIs to leverage existing applications as well.

MapR’s Converged Data Platform

MapR’s Converged Data Platform is a data fabric which works across both data in motion and data at rest, across all locations. Since it is a flexible fabric, it grows with nodes and cloud resources and can be scaled directly. Plus, all of these nodes participate in recovery, too.

One of the best parts about this platform is that companies do not need to deploy the whole fabric: they can choose select functionality for files, tables, or streams, and also deploy across locations (including the edge). Norris explained further: “This involves not only collecting data and deciding how to filter or aggregate the data locally before streaming it to a central location, but also how to process all of the data centrally to spot trends and anomalies. We push that intelligence back to the edge to have very fast local processing to respond quickly and appropriately to interesting events and data signatures.”file and data icons in the cloud

The MapR Converged Data Platform is composed of MapR-XD for files, MapR-DB for tables and documents, and MapR Streams. Integrating streams into the data fabric enables publish and subscribe capabilities. Another interesting benefit of integrating this into the data fabric is the retention of a stream for years, which means companies can use this in future developments. The Converged Data Platform converges three major sections:

  1. Enterprise-Grade Platform Services: At the core of the converged platform is MapR Platform services. This gives the users a set of file and data management services that include: global namespace, high availability, data protection, self-healing, access control, real-time performance, secure multi-tenancy, and management and monitoring.
  2. Open Source Engines and Tools: With Apache open source ecosystem projects, the open platform gives developers an extra hand including the ability to support multiple versions of key Apache projects. The MapR Converged Data Platform lets multi-tenant users update components at their own pace.
  3. Commercial Engines and Applications: With the platform, developers can use open APIs and interfaces. This enables the deployment of commercial software from vendors like SAP, SAS, HP, and Cisco for large-scale applications. For small teams, this means the capacity to create enterprise-grade software products with the built-in protections of the MapR platforms.

The Future of AI, IoT, and Big Data

Recently, MapR partnered with C3 IoT to accelerate the development of Artificial Intelligence (AI) and Internet of Things (IoT) applications, a collaboration driven by the need for more intelligence. Machine learning is now at a point where 90% of its success happens by addressing data logistics, how this data is available for training, and how these data flows work. Norris says that the future of analytics and machine learning will be about having a fabric that will support operations and analytics, citing the company’s data fabric as an example.

About MapR Technologies
MapR Technologies is an enterprise software company headquartered in San Jose, California. It provides Converged Data Platform providing customers access to a wide variety of data sources through the combination of real-time analytics and operational applications. With partners like Amazon, Cisco, Google, Microsoft, and SAP, the MapR ecosystem powers companies, giving them a competitive edge, through its data management platform. MapR recently received $56 million in new funding to help in the execution of its vision. The company is thankful for the support of investors like Google Alphabet.