I don’t quite know when it happened but we have recently added another V to the three existing characteristics of big data. Perhaps more. Gartner analyst Doug Laney gave us the first batch. High volume, real-time, rapid change velocity and unstructured variety. This certainly set big data apart and at least partially explained why the old tech combination of columns, rows and sql were no longer big, strong or fast enough to deal with it. We needed hadoop, columnar, nosql, massively parallel and other innovations to deal with a full three V’s.
More recently veracity has qualified for this somewhat exclusive club. Dealing with notions of sentiment, mentions and sociographics from tweets, facebook status updates and youtube comments is an imprecise practice very unlike traditional data processing where all transactions balance and net out to zero. According to IBM, one in four business leaders do not trust the data that they make decisions on and this new world is unlikely to make them feel any less queazy.
A quick search will find other candidate V’s including visualisation. Indeed, one source suggests we are up to six V’s but it is time to stop counting. Whilst classifying and characterising big data in this way is understandable it is not completely helpful. In fact, according to Wherescape CEO, Michael Whitehead it perpetuates the stereotype of navel gazing IT types. This ever increasing collection of V’s is not strictly true either. Some Big Data is not high volume, some not real time and some might even have a little structure.
It kind of misses the point as well as it misses out another twenty five letters of the alphabet. Big data is certainly sourced from different places – from web sites, social platforms, machines on the internet of things. It also certainly plentiful and strange. However, defining it in terms of where it has come from or how it is processed is a technicality. It would offer far more insight to discuss it in terms of how it can be used in retail, insurance and telecommunications.
Indeed, like many others, I can only really get behind one v. V for value. Like all data the test is what you do with it once you have it. If the answer is identify fraud, adjust an insurance premium in real-time, predict climate change patterns or alert a physician that a therapy regime is dangerously out of step then we can see something of value. If the answer is nothing then all that hadoop’ing, nosql’ing, massively parallel’ing and v counting is for idle curiosity.