Big Data Indescribably Large

Lost for words

I have been the first to be overly critical of those that define big data solely by size and (absence of) structure. That being said, it is inescapable that data volumes have reached an inflection point. In an article for the Wall Street Journal, Andrew McAfee makes a pretty startling observation. Data has gone from being measured in terabytes to petabytes and exabytes. He explains that in 2012 Cisco announced that its equipment was recording a zettabyte of data. Not startling so far and, in any case, outside of the circle of data geeks, few will have heard of a zettabyte.  The more jarring fact is that the next metric for measuring data is the final one. After the zettabyte is a yottabyte (10 to the power of 24  as you asked ) and then that’s it. We have literally run out of words to describe how big, big data is.

Big v Different

Commentators such as Jeff Jonas and Kenneth Culkier make the point that big is not just big. Big can be different.  David Weinberger, one of the authors of the CluetrainManifesto, makes a similar point in his book Too Big to Know. He proposes that knowledge has been shaped, perhaps even limited by its medium. Only the most important, meticulously researched facts were  committed to paper until the invention of the printing press. Even then, the printed medium carried figurative and literal weight.

In describing Big Data in Decision Sourcing, we  contrast transactional data with ambient data. Transactional data was limited by traditional data processing originally in the form of the punch card and more latterly the relational database. Ambient data, however, exists all around us. It’s size meant that it went unobserved or at least uncaptured. This is what has changed. Affordable and available technology means that signals generated through the internet of things and human social interaction can be captured in digital form providing new (and different) sources of insight. The relational database limited us to recording invoice lines and account details whilst new forms of data management allow us to capture every human gesture, comment and click. Meanwhile  the machines are logging everything they do.

Whats’s next?

Metric prefixes were last updated in 1991 at the 19th General Conference on Weights and Measures and beyond yotta, we got nothin’. Big Data means disruptive, transformational change in a way that we don’t completely understand today. In fact we don’t even have a name for what comes next. Yet.

Advertisements

Just Stop with the ‘Big Data is Just’

OK, I get it. You’re sceptical. You’ve seen stuff come and you’ve seen it go. To you big data is just BI, just data, just analytics for the hip kids, just a distraction or just hype and fad.

Except it isn’t. Big data is only ‘just’ analytics in the same way that cloud is ‘just’ asp or bureau. That is to say it isn’t at all.

It ‘aint hadoop either

Others define it in terms of the technology. I get this too. New tech is making it all possible and existing databases have been a barrier. New approaches like hadoop were borrowed from those that pioneered extracting value from enormous volumes of data. To the traditional data vendors, a terabyte was a big deal. They failed to notice that this was becoming standard in a home pc and that insurgent innovators were capturing, processing and mining mountains of data. They  didn’t keep up, so others had their lunch money and now they are playing catch-up.

But it would be wrong to define big data in terms of the innovation that allows it to happen. A little like defining fine dining as an activity conducted with knives, forks and a high quality napkin. It would be the most common mistake of the Big Data muggle.

The end of transaction oriented business

So if it’s not just ‘just’ and it’s not the technology … what is it?

It’s nothing less than a profound change in our approach to data. Historically, businesses managed themselves as a series of transactions. Occasional snapshots if you will. Only the essential financial and operational interactions between them and their customers. A quotation, an order, a despatch note and most importantly an invoice. Early on-line commerce  began to change this. Every gesture a customer made on their shopping journey could be captured. An abandoned basket in a supermarket tells the store manager nothing. Online, the same shopping cart could tell us that the delivery times are too long, the accessories were out of stock or that the secure shopping statement was in the wrong place. For the first time, so much data was being generated that ‘traditional’ analytics started to creak and groan and most of this type of analysis took place outside of corporate BI. It was ‘special’ click stream, needed specialised tools and the BI specialist and vendor shook their heads at it’s lack of structure. Where were the columns, rows and indexes.

This was just the beginning. Social platforms don’t just allow the analysis of shopping behaviours but all behaviours. If a customer comments, complains, compliments or converses in general about you or your brand, it is possible to know. It’s no longer heresay or anecdote, it’s available from the blogsphere or the Twitter firehose. It’s data.

Another beginning

Actually, that was just the beginning of the beginning. New classes of devices that can generate more data than the most active surfer or shopper are boosting the on-line population. Forget smart meters and the internet fridge, at least for now. Think more about ultra-low cost devices that remind you to water the yukka, feed the guppy or take your medication. If you forget any of these, particularly the medication, they will probably tell others too. Connected asthma inhalers can provide insight into air quality and cars that connect with your insurers who adjust your insurance premiums because your acceleration and braking patterns suggest that you are driving like you are on a track day rather than on the hanger lane gyratory. Oh and my new pebble watch (when it arrives) will add to the billions of facts, snippets and streams being added to that one big database in the sky. The cloud.

Ambient Data and why Big Data is Big

Big data represents a profound change. In our book Decision Sourcing, Gower, 2013, we refer to it as ‘ambient’ rather than big data. Ambient because we have always been surrounded by our thoughts, gestures, actions and conversations but they have never been data before. They were lost (as Rutger said) ‘like tears in rain’.

Today, we are approaching an an age where it is possible and practical to know everything that there is to know. Everything that is (to use an arcane legal expression) ‘uttered and muttered’. That’s what makes it big. Really big. Teradata think a Tera is big but it’s just a walk to the shops compared to Big Data.

Oh no  it is not ‘just’ anything. It is the beginning of the most significant shift in our industry since it began. The complexities are many, the data as varied as it is voluminous but the prize is knowledge and insight much of it predictive. Indeed, everything we have done to this point has been in preparation for the age of Big Data.

If Big Data is just anything right now … it’s just the beginning.

Information Curation 1 dot 2

The big, fat and very cool Kabocha
At the end of a jetty on a beach near Benesse House, the big, fat and very cool Kabocha

Dot to Dot: In the previous Post

In part one we examined how the curatorial process is one that is relevant to the way in which businesses make informed decisions. We examined how Frances Morris, curator of the Kusama exhibition at the Tate Modern in 2012, dealt with abundance the most pressing issue for those of us dealing with exponentially increasing data volumes today. We also saw that curation has parallels with analysis. One that starts with very few assumptions, perhaps an inkling that there is a story to tell, but then becomes more focused as evidence is sifted, examined and understood.

In this, second part, we look at filtering, relevance and how the curatorial process helps us understand which comes first … data or information.

Relevance not Completeness

As I listened Morris at the Tate, it was clear that the story she wanted to tell was as much a product of the things she left out as it was the things she included. Morris described  how she visited a site on the Japanese island, Naoshima, to see an example of Kusama’s famous pumpkins. Perched at the of end of a pier, jutting into the Inland Sea, she decided that to take it out of context would be to lose something of the truth. This lead to, perhaps, her most controversial decision amongst Kusama’s many fans, to not include one of Kusama’s recurring themes in the summer exhibition. The pumpkins, similarly to the most frequently used data, were popular. They were well known and well understood. However, they didn’t bring anything new. At the end of the pier, they were relevant and contextual. In an exhibition intended to deliver insight into ‘Kusama’s era’s’, the key points at which the artist had reinvented herself they added nothing new.

Story First

One of the most telling characteristic of Morris’s curatorial process was that the story she wanted to tell was not limited by the art. Kusama was a leader in the 60’s New York avante garde movement. She was outlandish and outspoken, sometimes shocking. Not all of this is obvious from her art but it was an important thread in Morris’s story. To remedy this she chose to exhibit documents and papers that gave Kusama a voice. Clippings, letters and personal artefacts enriched the story. The result was a much more complete picture of an artist who’s influence on culture and society had as much to do with her activism, performance art and outrageous ‘happenings’ as her art.

Sometimes, as analysts, we limit our story by what is in the database or data warehouse. Smart decisions should be informed but that doesn’t mean to the exclusion of other forms of knowledge. That which is anecdotal and tacit alongside the ‘facts’ might provide a more complete and accurate picture. Information exists outside of columns and rows.

Joining the Dots

Does the curatorial process deliver insight? Does it ultimately leave it’s visitors with the “facts” insofar as we can as they relate to life and art. The test would be Kusama’s reaction to Morris’s exhibition when she visited for a private viewing before it was opened to the public. It seems the answer is an overwhelming yes. At one point, as Morris walked Kusama around the exhibition, she wept. The collection which spanned nine decades of an extraordinary life had struck a deep and personal chord. This visceral reaction was an acknowledgement that it was an essential truth from perhaps the only one who knew, in this case, what the truth really was.

Knowledge does not leap off a computer screen or printed page any more than the life of an artist leaps off a gallery wall. It is a synthesis of data and information. To deliver a report, chart or scorecard is not to deliver knowledge. The job is only part done. The information needs to be socialised, discussed, debated and supplemented with what we know of our customers and products.  Neither is the process just ‘analysis’. It is one of selecting that which is relevant, excluding that which is not and enriching with the experiences and opinions of those in the business who’s expertise is not captured in rows and columns. In a world where we are overwhelmed with information, knowledge and understanding requires curation.

The nine decades of Yayoi Kusama at the Tate. 

Frances Morris discusses and explores Yayoi Kusama’s life and work. Taking the audience through her curatorial processes, Morris will map out the exhibition from its origins to completion. The curator will also reflect on her personal journey with Kusama, having had the opportunity to work closely with her over the last three years.

Information Curation: 1 dot 1

Connecting the Dots

kusama3_bodyOn an uncharacteristically warm Summer evening in 2012 I made my way into the Tate Modern as everyone else was making their way out. It was part of my work to understand the curatorial process and its relevance to information management through one of the Tate’s infrequent but excellent curator talks. This one, from Frances Morris, concerned the recent and enormously popular Kusama exhibition.

 

The notion that curation is an emerging skill in dealing with information is not a new one. It is covered by Jeff Jarvis in his blog post ‘Death of the Curator. Long Live the Curator’ where Jarvis applies them to the field of journalism. It is also the subject of Steven Rosenbaum’s excellent book ‘Curation Nation’ which examines the meme more broadly.

 

Abundance

Japanese artist Yayoi Kusama is prolific. Her work span the many decades of her life, first in rural Japan then New York in the 60’s and in contemporary Tokyo today. It is enormously varied. Her signature style of repeating dot patterns, whilst the most famous, represents only a small part of a vast and sprawling body of work. It is the perfect artistic allegory for information overload. Kusama has too much art for any one exhibition in the same way that information professionals in the age of Big Data have too much information for any one decision.

 

Morris, I figured, must have wrestled with Kusama’s prodigious nature. The problem is not one of assembling a coherent and factual account. Instead, it is one of separating out that which is relevant and that which is extraneous. It is a process of  building a series of working hypotheses and building a story that is a reality, that is a ‘truth’.

 

Analysis and Curation

Like many managers, Morris had a vague sense of the story she wanted to tell but the final story could only be told through material facts, works or ‘data’.  At first, she considered, selected, dissected and parsed as much as possible. Over time Morris selected works through more detailed  research. She travelled extensively spending time with Kusama herself in a psychiatric institution which has (voluntarily) been Kusama’s home since 1977. She also visited locations important to Kusama including her family home and museums in Matsumoto, Chiba and Wellington, New Zealand where others had curated and exhibited her work. This parallels the analytical process. One of  starting with very few, if any assumptions, and embarking a journey of discovery. Over time, through an examination of historical and contemporary data points, the story begins to unfold.

 

In the Next Post (1 dot 2)

Already we can see that curating is a process of research and selection. It has strong parallel’s with early stages of information analysis. In the next post we will look at filtering, relevance and how the curatorial process helps us understand which comes first … data or information.

Big Data Analytics: Size is not important

There was a time when databases came in desktop, departmental and  enterprise sizes.  There was nothing larger than ‘enterprise’ and very few enterprises needed databases that scaled to what was the largest imaginable unit of data, the terabyte. They even named a database after it.

We now live in the world of the networked enterprise. Last year, according to the IDC, the digital universe totalled 1.2 zettabytes of data. And we are only at the beginning of the explosion which is set to grow by as much as 40 times by 2020. Massive data sets are being generated by web logs, social networks, connected devices and RFID tags. This is even before we connect our fridges (and we will) to the internet. Data volumes are growing at such a click that we needed a new term, Big Data (I know) to describe it.

What is meant by ‘big’ is highly subjective but the term is loosely used to  describe volumes of data that can not be dealt with by a conventional RDBMS running on conventional hardware. That is to say, alternative approaches to software, hardware or data architectures (Hadoop, map reduce, columnar, distributed data processing etc) are required.

Big Data is not just more of the same though.  Big Data is fundamentally different. It’s new and new data can present new opportunities. According to  the Mckinsey Global Institute the use of big data is a key way for leading companies to outperform their peers. Leading retailers, like Tesco, are already using big data to take even more market share.

This is because Big Data represents a fundamental shift from capturing transactions for analysis to capturing interactions. The source of todays analytic applications are customer purchases, product returns and  supplier purchase orders whilst Big Data captures every customer click and conversation. It can capture each and every interaction. This represents an extraordinary opportunity to capture, analyse and understand what customers really think about products and services or how they are responding to a marketing campaign as the campaign is running.

Deriving analytics from big data, from content, unstructured data and natural language conversations requires a  new approach. In spite of the name though, it’s less about the size and more about the structure (or absence of structure) and level at which organisations can now understand their businesses and their customers.