Menu

How a Hadoop distribution can help you manage big data

Azharuddin
2017-06-05
2017-06-05
  • Azharuddin

    Azharuddin - 2017-06-05

    Many companies are struggling to manage the massive amounts of data they collect. Whereas in the past they may have used a data warehouse platform, such conventional architectures can fall short for dealing with data originating from numerous internal and external sources and often varying in structure and types of content. But new technologies have emerged to offer help -- most prominently, Hadoop, a distributed processing framework designed to address the volume and complexity of big data environments involving a mix of structured, unstructured and semi-structured data.

    Part of Hadoop's allure is that it consists of a variety of open source software components and associated tools for capturing, processing, managing and analyzing data. However, in order to help users take advantage of the framework, many vendors offer commercial Hadoop distributions that provide performance and functionality enhancements over the base Apache open source technology and bundle the software with maintenance and support services. As the next step, let's take a look at how a Hadoop distribution could benefit your organization.

    Making a case for a Hadoop distribution

    Hadoop runs in clusters of commodity servers and typically is used to support data analysis and not for online transaction processing applications. Several increasingly common analytics use cases map nicely to its distributed data processing and parallel computation model. The list includes:

    Operational intelligence applications for capturing streaming data from transaction processing systems and organizational assets, monitoring performance levels, and applying predictive analytics for pre-emptive maintenance or process changes.

    Web analytics, which are intended to help companies understand the demographics and online activities of website visitors, review Web server logs to detect system performance problems, and identify ways to enhance digital marketing efforts.

    Security and risk management, such as running analytical models that compare transactional data to a knowledge base of fraudulent activity patterns, as well as continuous cybersecurity analysis for identifying emerging patterns of suspicious behavior.

    Marketing optimization, including recommendation engines that absorb huge amounts of Internet clickstream and online sales data and blend that information with customer profiles to provide real-time suggestions for product bundling and upselling.

    Internet of Things applications, such as analyzing data from things -- like manufacturing devices, pipelines and so-called smart buildings -- via sensors that continuously generate and broadcast information about their status and performance.

    Sentiment analysis and brand protection, which might involve capturing streaming social media data and analyzing the text to identify unsatisfied customers whose issues can be addressed quickly.
    Massive data ingestion for data collection, processing and integration scenarios such as capturing satellite images and geospatial data.

    Data staging, in which Hadoop is used as an initial landing spot for data that is then integrated, cleansed and transformed into more structured formats in preparation for loading into a data warehouse or analytical database for analysis.

     
  • amar kayam

    amar kayam - 2017-07-27
    Post awaiting moderation.
  • Big Classes

    Big Classes - 2017-09-01
    Post awaiting moderation.
  • adam lee

    adam lee - 2017-11-10

    Today we live in a DATA world. Anything and everything that we do in the internet is becoming a source of business information for the organizations across the globe. The world has seen an exponential growth of data in the last decade or so and more so since last 3 years. Hence, the industry has started to look out for the ways to handle the data and get some business value out of it through data analytics. One such jail-break is “HADOOP”.
    Yes, Hadoop is here to stay and lead the industry in helping the business with numerous ways to store, retrieve and analyze data.

     

Log in to post a comment.

Want the latest updates on software, tech news, and AI?
Get latest updates about software, tech news, and AI from SourceForge directly in your inbox once a month.