Contemporary and beneficial uses of Big-Data Systems

Big data is a term applied to datasets whose size or type is beyond the ability of traditional relational databases to capture, manage, and process the data with low-latency. (IBM, 2018) The V’s of Big Data They share at least one or more of the common V’s; known as Volume, Velocity and Variety. Additionally, Veracity, Variability, Visualisation and Value are often seen to be added to the core list as drivers towards a more rounded architecture....

May 13, 2020 · 3 min · 619 words · Andrew

Data Warehouses vs Data Marts

Although the terms “data warehouse” and “data mart” sound similar, they are quite different. It is important to first understand how they differ in order to define some characteristics and practical applications for each. Serra (2012) has a great explanation of data warehouses as being “a single organizational repository of enterprise-wide data across many or all subject areas”; while Inmon classifies the data mart as “a logical view or a physical extract of a larger data warehouse, usually isolated for the need to have a special data model or schema” (n....

April 22, 2020 · 3 min · 594 words · Andrew

Apache Kafka’s Role in Big Data Streaming Analytics

The world of Big Data started out as a way of storing and querying obscene amounts of information by comparison to what yesteryears were able to achieve. However, most value in said data is primarily found in the real time or streaming information that is presented as it first enters the system. As data gets old, it also gets stale and less useful to many business systems. Streaming analytics platforms have come a long way with numerous open source projects offering advanced products such as Flink, Spark Streaming, Samza and Storm; which are all at the forefront of the arena in their respective strengths....

April 17, 2020 · 3 min · 558 words · Andrew

Big Data Security and Privacy Issues

Big Data shares what is commonly termed the V properties or characteristics such as Velocity, Volume and Variety which are amongst the most typical or frequently repeated. Taking into account security issues and privacy implications with such large datasets is a challenging ordeal that needs a repeatable framework to cover all areas. Volume is with little to no doubt the most highly targeted characteristic as the expression Big Data fundamentally factors a voluminous amount of information or data that needs to be processed (McCafferty, 2013)....

March 2, 2018 · 3 min · 576 words · Andrew

Netflix Hadoop Big Data Marketing Use Case

Netflix is a video streaming service that has a wealth of information about their user base likes, dislikes, general consumer habits, retention lengths and much more. Netflix uses their big data to commission original programming content that it knows will succeed and be accepted in relevant published markets (O?Neill, 2016). They perform various A/B tests to determine which variant of similar things perform higher, for example, when showing cover images for series or movies, they will at random show alternative images to determine which proves more reactive from their user base....

January 28, 2018 · 3 min · 562 words · Andrew

Using Hadoop to manage Dark Data

Dark Data is the biggest piece of the pie (Datumize , n.d.) when it comes to Big Data and what lies beneath huge datasets of collected information. IBM has stated in a report that over 80 percent of all data is dark and unstructured, meaning that it is simply too much data to process, analyse or unlock valuable information from. This data is termed Dark Data and is mostly unstructured and oftentimes missing or incomplete, which leaves a lot of potential for solutions to be built around high volume analytics and processing....

January 19, 2018 · 3 min · 612 words · Andrew