About-Big Data

Definition
The term has been in use since the 1990s, with some giving credit to John Mashey for coining or at least making it popular. Big data usually includes data sets with sizes beyond the ability of commonly used software tools to capturecurate, manage, and process data within a tolerable elapsed time. Big data "size" is a constantly moving target, as of 2012 ranging from a few dozen terabytes to many petabytes of data. Big data requires a set of techniques and technologies with new forms of integration to reveal insights from datasets that are diverse, complex, and of a massive scale.

Big Data
Big data is new and “ginormous” and scary –very, very scary. No, wait. Big data is just another name for the same old data marketers have always used, and it’s not all that big, and it’s something we should be embracing, not fearing. No, hold on. That’s not it, either. What I meant to say is that big data is as powerful as a tsunami, but it’s a deluge that can be controlled . . . in a positive way, to provide business insights and value. Yes, that’s right, isn’t it?
Over the past few years, I have heard big data defined in many, many different ways, and so, I’m not surprised there’s so much confusion surrounding the term. Because of all the misunderstanding and misperceptions.
You won’t get far untangling your big data hairball if, for example, half of your company is forgetting to include traditional data in the calculus or if some don’t think social network interactions “really” matter. So, please, take a minute to get back to basics and do a simple self-check. Ask yourself, your team, the C-suite:
How do we define big data?
While I fully expect your company to add its own individual tweaks here or there, here’s the one-sentence definition of big data I like to use to get the conversation started:
Big data is a collection of data from traditional and digital sources inside and outside your company that represents a source for ongoing discovery and analysis.
Some people like to constrain big data to digital inputs like web behavior and social network interactions; however the CMOs and CIOs I talk with agree that we can’t exclude traditional data derived from product transaction information, financial records and interaction channels, such as the call center and point-of-sale. All of that is big data, too, even though it may be dwarfed by the volume of digital data that’s now growing at an exponential rate.
In defining big data, it’s also important to understand the mix of unstructured and multi-structured data that comprises the volume of information.
Unstructured data comes from information that is not organized or easily interpreted by traditional databases or data models, and typically, it’s text-heavy. Metadata, Twitter tweets, and other social media posts are good examples of unstructured data.
Multi-structured data refers to a variety of data formats and types and can be derived from interactions between people and machines, such as web applications or social networks. A great example is web log data, which includes a combination of text and visual images along with structured data like form or transactional information.  As digital disruption transforms communication and interaction channels—and as marketers enhance the customer experience across devices, web properties, face-to-face interactions and social platforms—multi-structured data will continue to evolve.
Industry leaders like the global analyst firm Gartner use phrases like “volume” (the amount of data), “velocity” (the speed of information generated and flowing into the enterprise) and “variety” (the kind of data available) to begin to frame the big data discussion. Others have focused on additional V’s, such as big data’s “veracity” and “value.”
One thing is clear: Every enterprise needs to fully understand big data – what it is to them, what is does for them, what it means to them –and the potential of data-driven marketing, starting today. Don’t wait. Waiting will only delay the inevitable and make it even more difficult to unravel the confusion.
Once you start tackling big data, you’ll learn what you don’t know, and you’ll be inspired to take steps to resolve any problems. Best of all, you can use the insights you gather at each step along the way to start improving your customer engagement strategies; that way, you’ll put big data marketing to work and immediately add more value to both your offline and online interactions.
Sampling
An important research question that can be asked about big data sets is whether you need to look at the full data to draw certain conclusions about the properties of the data or is a sample good enough. The name big data itself contains a term related to size and this is an important characteristic of big data. But Sampling (statistics) enables the selection of right data points from within the larger data set to estimate the characteristics of the whole population. For example, there are about 600 million tweets produced every day. Is it necessary to look at all of them to determine the topics that are discussed during the day? Is it necessary to look at all the tweets to determine the sentiment on each of the topics? In manufacturing different types of sensory data such as acoustics, vibration, pressure, current, voltage and controller data are available at short time intervals. To predict down-time it may not be necessary to look at all the data but a sample may be sufficient. Big Data can be broken down by various data point categories such as demographic, psychographic, behavioral, and transactional data. With large sets of data points, marketers are able to create and utilize more customized segments of consumers for more strategic targeting.
There has been some work done in Sampling algorithms for big data. A theoretical formulation for sampling Twitter data has been developed.

Comments