Just Stop with the ‘Big Data is Just’

OK, I get it. You’re sceptical. You’ve seen stuff come and you’ve seen it go. To you big data is just BI, just data, just analytics for the hip kids, just a distraction or just hype and fad.

Except it isn’t. Big data is only ‘just’ analytics in the same way that cloud is ‘just’ asp or bureau. That is to say it isn’t at all.

It ‘aint hadoop either

Others define it in terms of the technology. I get this too. New tech is making it all possible and existing databases have been a barrier. New approaches like hadoop were borrowed from those that pioneered extracting value from enormous volumes of data. To the traditional data vendors, a terabyte was a big deal. They failed to notice that this was becoming standard in a home pc and that insurgent innovators were capturing, processing and mining mountains of data. They  didn’t keep up, so others had their lunch money and now they are playing catch-up.

But it would be wrong to define big data in terms of the innovation that allows it to happen. A little like defining fine dining as an activity conducted with knives, forks and a high quality napkin. It would be the most common mistake of the Big Data muggle.

The end of transaction oriented business

So if it’s not just ‘just’ and it’s not the technology … what is it?

It’s nothing less than a profound change in our approach to data. Historically, businesses managed themselves as a series of transactions. Occasional snapshots if you will. Only the essential financial and operational interactions between them and their customers. A quotation, an order, a despatch note and most importantly an invoice. Early on-line commerce  began to change this. Every gesture a customer made on their shopping journey could be captured. An abandoned basket in a supermarket tells the store manager nothing. Online, the same shopping cart could tell us that the delivery times are too long, the accessories were out of stock or that the secure shopping statement was in the wrong place. For the first time, so much data was being generated that ‘traditional’ analytics started to creak and groan and most of this type of analysis took place outside of corporate BI. It was ‘special’ click stream, needed specialised tools and the BI specialist and vendor shook their heads at it’s lack of structure. Where were the columns, rows and indexes.

This was just the beginning. Social platforms don’t just allow the analysis of shopping behaviours but all behaviours. If a customer comments, complains, compliments or converses in general about you or your brand, it is possible to know. It’s no longer heresay or anecdote, it’s available from the blogsphere or the Twitter firehose. It’s data.

Another beginning

Actually, that was just the beginning of the beginning. New classes of devices that can generate more data than the most active surfer or shopper are boosting the on-line population. Forget smart meters and the internet fridge, at least for now. Think more about ultra-low cost devices that remind you to water the yukka, feed the guppy or take your medication. If you forget any of these, particularly the medication, they will probably tell others too. Connected asthma inhalers can provide insight into air quality and cars that connect with your insurers who adjust your insurance premiums because your acceleration and braking patterns suggest that you are driving like you are on a track day rather than on the hanger lane gyratory. Oh and my new pebble watch (when it arrives) will add to the billions of facts, snippets and streams being added to that one big database in the sky. The cloud.

Ambient Data and why Big Data is Big

Big data represents a profound change. In our book Decision Sourcing, Gower, 2013, we refer to it as ‘ambient’ rather than big data. Ambient because we have always been surrounded by our thoughts, gestures, actions and conversations but they have never been data before. They were lost (as Rutger said) ‘like tears in rain’.

Today, we are approaching an an age where it is possible and practical to know everything that there is to know. Everything that is (to use an arcane legal expression) ‘uttered and muttered’. That’s what makes it big. Really big. Teradata think a Tera is big but it’s just a walk to the shops compared to Big Data.

Oh no  it is not ‘just’ anything. It is the beginning of the most significant shift in our industry since it began. The complexities are many, the data as varied as it is voluminous but the prize is knowledge and insight much of it predictive. Indeed, everything we have done to this point has been in preparation for the age of Big Data.

If Big Data is just anything right now … it’s just the beginning.

Advertisements

Information Curation 1 dot 2

The big, fat and very cool Kabocha
At the end of a jetty on a beach near Benesse House, the big, fat and very cool Kabocha

Dot to Dot: In the previous Post

In part one we examined how the curatorial process is one that is relevant to the way in which businesses make informed decisions. We examined how Frances Morris, curator of the Kusama exhibition at the Tate Modern in 2012, dealt with abundance the most pressing issue for those of us dealing with exponentially increasing data volumes today. We also saw that curation has parallels with analysis. One that starts with very few assumptions, perhaps an inkling that there is a story to tell, but then becomes more focused as evidence is sifted, examined and understood.

In this, second part, we look at filtering, relevance and how the curatorial process helps us understand which comes first … data or information.

Relevance not Completeness

As I listened Morris at the Tate, it was clear that the story she wanted to tell was as much a product of the things she left out as it was the things she included. Morris described  how she visited a site on the Japanese island, Naoshima, to see an example of Kusama’s famous pumpkins. Perched at the of end of a pier, jutting into the Inland Sea, she decided that to take it out of context would be to lose something of the truth. This lead to, perhaps, her most controversial decision amongst Kusama’s many fans, to not include one of Kusama’s recurring themes in the summer exhibition. The pumpkins, similarly to the most frequently used data, were popular. They were well known and well understood. However, they didn’t bring anything new. At the end of the pier, they were relevant and contextual. In an exhibition intended to deliver insight into ‘Kusama’s era’s’, the key points at which the artist had reinvented herself they added nothing new.

Story First

One of the most telling characteristic of Morris’s curatorial process was that the story she wanted to tell was not limited by the art. Kusama was a leader in the 60’s New York avante garde movement. She was outlandish and outspoken, sometimes shocking. Not all of this is obvious from her art but it was an important thread in Morris’s story. To remedy this she chose to exhibit documents and papers that gave Kusama a voice. Clippings, letters and personal artefacts enriched the story. The result was a much more complete picture of an artist who’s influence on culture and society had as much to do with her activism, performance art and outrageous ‘happenings’ as her art.

Sometimes, as analysts, we limit our story by what is in the database or data warehouse. Smart decisions should be informed but that doesn’t mean to the exclusion of other forms of knowledge. That which is anecdotal and tacit alongside the ‘facts’ might provide a more complete and accurate picture. Information exists outside of columns and rows.

Joining the Dots

Does the curatorial process deliver insight? Does it ultimately leave it’s visitors with the “facts” insofar as we can as they relate to life and art. The test would be Kusama’s reaction to Morris’s exhibition when she visited for a private viewing before it was opened to the public. It seems the answer is an overwhelming yes. At one point, as Morris walked Kusama around the exhibition, she wept. The collection which spanned nine decades of an extraordinary life had struck a deep and personal chord. This visceral reaction was an acknowledgement that it was an essential truth from perhaps the only one who knew, in this case, what the truth really was.

Knowledge does not leap off a computer screen or printed page any more than the life of an artist leaps off a gallery wall. It is a synthesis of data and information. To deliver a report, chart or scorecard is not to deliver knowledge. The job is only part done. The information needs to be socialised, discussed, debated and supplemented with what we know of our customers and products.  Neither is the process just ‘analysis’. It is one of selecting that which is relevant, excluding that which is not and enriching with the experiences and opinions of those in the business who’s expertise is not captured in rows and columns. In a world where we are overwhelmed with information, knowledge and understanding requires curation.

The nine decades of Yayoi Kusama at the Tate. 

Frances Morris discusses and explores Yayoi Kusama’s life and work. Taking the audience through her curatorial processes, Morris will map out the exhibition from its origins to completion. The curator will also reflect on her personal journey with Kusama, having had the opportunity to work closely with her over the last three years.

Information Curation: 1 dot 1

Connecting the Dots

kusama3_bodyOn an uncharacteristically warm Summer evening in 2012 I made my way into the Tate Modern as everyone else was making their way out. It was part of my work to understand the curatorial process and its relevance to information management through one of the Tate’s infrequent but excellent curator talks. This one, from Frances Morris, concerned the recent and enormously popular Kusama exhibition.

 

The notion that curation is an emerging skill in dealing with information is not a new one. It is covered by Jeff Jarvis in his blog post ‘Death of the Curator. Long Live the Curator’ where Jarvis applies them to the field of journalism. It is also the subject of Steven Rosenbaum’s excellent book ‘Curation Nation’ which examines the meme more broadly.

 

Abundance

Japanese artist Yayoi Kusama is prolific. Her work span the many decades of her life, first in rural Japan then New York in the 60’s and in contemporary Tokyo today. It is enormously varied. Her signature style of repeating dot patterns, whilst the most famous, represents only a small part of a vast and sprawling body of work. It is the perfect artistic allegory for information overload. Kusama has too much art for any one exhibition in the same way that information professionals in the age of Big Data have too much information for any one decision.

 

Morris, I figured, must have wrestled with Kusama’s prodigious nature. The problem is not one of assembling a coherent and factual account. Instead, it is one of separating out that which is relevant and that which is extraneous. It is a process of  building a series of working hypotheses and building a story that is a reality, that is a ‘truth’.

 

Analysis and Curation

Like many managers, Morris had a vague sense of the story she wanted to tell but the final story could only be told through material facts, works or ‘data’.  At first, she considered, selected, dissected and parsed as much as possible. Over time Morris selected works through more detailed  research. She travelled extensively spending time with Kusama herself in a psychiatric institution which has (voluntarily) been Kusama’s home since 1977. She also visited locations important to Kusama including her family home and museums in Matsumoto, Chiba and Wellington, New Zealand where others had curated and exhibited her work. This parallels the analytical process. One of  starting with very few, if any assumptions, and embarking a journey of discovery. Over time, through an examination of historical and contemporary data points, the story begins to unfold.

 

In the Next Post (1 dot 2)

Already we can see that curating is a process of research and selection. It has strong parallel’s with early stages of information analysis. In the next post we will look at filtering, relevance and how the curatorial process helps us understand which comes first … data or information.

EA: Why Being Worst Matter More than They Think?

It seems that beating the tobacco companies and those behind environmental negligence to the title of ‘Worst Company in America’ has not been an exercise in humility for Electronic Arts

 

In a statement to Gamer web site Kontaku, EA said “We’re sure that British Petroleum, AIG, Philip Morris, and Halliburton are all relieved they weren’t nominated this year. We’re going to continue making award-winning games and services played by more than 300 million people worldwide.”

 

The statement was described as arrogant and dismissive by Paul Tassi, Forbes contributor. I would add short sighted too.

 

EA are pointing to their worldwide sales achievement to dismiss the vote as inconsequential. However, what they are forgetting in their hubris is that sales is the classic ‘lagging’ indicator. Sales are recorded monthly and publicly announced quarterly and annually in most businesses. Sentiment, on the other hand, is a leading indicator. A dip in employee engagement means that customers are about to become unhappy. A dip in customer sentiment means that your sales are about to be hit. Robert Kaplan and David Norton introduced the business world to this cause-and-effect chain decades ago. Customers drive revenues, your business produces value that your customers love or hate, your staff drive the business, your investment in your staff motivates or demotivates them. Simple but a point that the EA spokesman appears to be missing.

 

Now I don’t know the extent to which gamers are about to extract their ire but I do know when a company has spoken too soon. And EA have. EA should reflect on the feedback. Their customers are telling them that they don’t feel respected, that their culture is corporate over creativity, that they are emptying wallets but giving only the bare minimum back.

 

In the light of that sentiment, they should really not be sitting on laurels made of  last quarter’s or last year’s sales. They are gone. Sentiment like this can gather momentum, capture the imagination of a well connected community and have far reaching consequences down the line.  EA should have thought before they spoke. The impact of  the ignominy behind this award is yet to be felt.

Decision Making problems are not new, in fact they are centuries old

Not Frank BuytendijkFrank Buytendijk delivered a great keynote at 8am in Las Vegas at the TDWI conference in February 2012. He avoided the technicalities of data architectures, the rigours of  data modelling and the disciplines of agile methods.

 

Instead, over breakfast, he dipped into the world of philosophy and asked us to consider the centuries old problems of what is true? what is real? and what is good?

 

Referring to Plato, Thales and Machiavelli Buytendijk lead us through some fundamentals about decision making.

What is True?

Firstly decisions are not just about the data. Do we decide to pay for parking because we calculate the cost of a ticket against the cost of a fine but factored by the risk of getting a fine? Or do we do it because we think it is the ‘right’ thing to do, the ‘civic’ thing to do?

 

What is Real?

So often, even with all the dashboards, scorecards, reports and charts, senior executives don’t seem to know what’s going on. Like in Plato’s Cave, the shadows on the wall are not reality, they are representations of reality. How much could really be told by listening to our customers directly rather than waiting for analysis much later?

 

What is Good?

Predictive analytics can provide great information that allow micro-segmentation. For example it could help an insurance company to identify those most likely to claim on their insurance policy for back and neck strain based on their on-line behaviours. Increasing their premiums might protect the business from additional costs but  the insurance business model is about distributing the risk not identifying it perfectly. Taken to it’s conclusion then there is no need for insurance, we all pay for the cost of our health care as and when it happens. However, if the insurance company used this information to promote lifestyle changes for this group then ethics and business models are aligned.

 

What’s it all about?

Buytendijk’s quirky, thought provoking start to the TDWI conference tells us that in IT, we  are wrestling with problems that preoccupied philosophers centuries ago. It also tells us though that in IT we can think too much and reflect too little.

Social Media Listening, Lets Call the Whole Thing Off?

The couple in Ira Gershwin’s song Lets call the whole thing off  lamented the way they pronounced the same words differently because it exposed class differences which might eventually be their undoing. Human communication is a funny thing. If Fred Astaire and Ginger Rogers had met on Facebook then regardless of how they pronounced neither, either and tomato, they would have assumed that they, like the spelling, were a perfect match.

Understanding nuance in human communication is a preoccupation for those of us building social media analytic applications and specifically as it applies to the Social Listening process. Social listening is the data collection process in a social media analytics application, the point at which the vast sea of blog, editorial and social media content is collected and converted into usable analysis. The purpose of Social Listening is to collect and filter ‘mentions’, instances of the company, brand, product or marketing campaign being referenced in an item of online content. Most platforms are good at collecting mentions but many fail in their level of accuracy, not because of scale and volume but because they don’t understand the human capacity for saying the same thing in so many different ways.

Fred and Ginger were both speaking (American) English and yet still had problems because language is only one of the many considerations when we try to understand the written word. Slang, regional idioms and differences in style relating to social groupings, profession, generation and gender are just a few others.

Anyone with teenage children can tell you about generational language differences. At one time my Son and his friends frequently used the expression ‘you just got pwned’ or ‘he pwned me’ usually but not exclusively when gaming. It describes the process of being decisively and unambiguously beaten by a competitor. ‘Pwned’ is a corruption of ‘owned’ attributed to a mis-spelling by a world of warcraft map designer and for some reason it fell into common usage. Unlike much of what we deal with in information systems, there is no rule, no derivation, it is simply something which is known. Without this knowledge what would a social media monitoring platform make of the tweet ‘coke pwns pepsi’ (or the other way around, of course)?

Other differences are equally obtuse. Take emoticons. Baby boomers rarely use them, gen-X ers commonly use them and gen Y-ers use them but differently. A gen-X er is more likely to use 🙂 and a gen -Y er 🙂 Very little difference to the human eye but in traditional text filters they simply don’t match.

Many are a little surprised when I point out that the author’s gender makes a difference to the language used. Of course, women might be more likely to discuss hormone replacement therapies and men more likely to discuss male pattern baldness if they are blogging about their mid-life crisis but given a gender-neutral topic, men and women still use different language. One website, gender genie, can identify the gender of the author of a piece of text with a surprisingly high degree of accuracy.

What does all of this mean? It means that Social Media Analytics platforms have to understand the rich, inconsistent and unfathomable ways in which we all converse. To get more specific and technical, social listening must employ linguistic variant sets to accurately disambiguate language variations. Simply put, they must be able to handle a set of alternative way of saying the same thing. Social listening must be inclusive of all diversity regardless of age, gender, ethnicity, social status, profession and yes, sexuality before they can capture data suitable for the purpose of analytics. Otherwise, you might as well just call the whole thing off.

 

Also reproduced for IBM Vision for the IT expert community.

The Social Triangle: Business, Brand and Analytics

Social TriangleMention social and we immediately think about the dizzying number of people using Facebook and, as businesses, how we reach them as customers or prospective customers.

 

This is only part of the story though.  Today, my own business, a provider of information software and services, will not find it’s customers on Facebook however hard we look.  However, this doesn’t mean that social isn’t relevant to us. That would be a limiting and ‘traditional’ view of Social as a Brand only which is a single point on the Social Triangle.

 

Michael Brito, SVP of Edelman Digital, commented in a recent article on Brainyard distinguished between the Social Business and the Social Brand.  The Social Brand, he argues, is a company, product, or individual that uses social technologies to communicate with social customers, their partners and constituencies, or the public. The Social Business, on the other hand, is one that has integrated and operationalized social media within job functions internally. The third point on the triangle is Analytics, the practical use of information to make decisions.

 

The aspiration is that both Brand and Business are for engagement not just broadcasting and that Analytics is used as actionable information. Let me offer an example.

 

I recently tried to book a London hotel room for my Son because he had a very early train journey on the Eurostar. I wanted to pay so that it was one less thing for him to be concerned about at 4am. I made an advanced reservation and several days, calls, emails and faxes (yes faxes) later and the hotel chain could still not confirm this part of the arrangement. Whilst I don’t do this often, I resorted to tweeting a #fail.

 

What happened next was pure Social Brand. A number of other hotel chains messaged me to offer me deals in London Hotels. Indeed, they still do. It left an overriding impression that everyone listened but no one heard.

 

A Social Business with a Social Brand using Social Analytics would have behaved completely differently. The tweet would have appeared in a dashboard and tagged as negative sentiment and that this related to dissatisfaction with the booking process.  Social Analytics would have been able to identify that I was a frequent traveller with children in university and that I was highly likely to use UK hotels over the coming 12 months. Social analytics would also have been able to identify the level of influence I have with others in this socio and demographic group (not as high as I think)

 

The information would have been shared around the organisation not just Marketing and it would have been shared efficiently using social tools not email.   A customer services representative may have tried to resolve the specific for me but the general issue would have found it’s way to a manager responsible for the booking process after which a decision will be made to  either fix this in their booking systems to attract other ‘surrogate bookers’ or to continue to deal with it as exception or even to do nothing. Next time a frustrated parent booking arrived everyone would know how to handle it or what the policy was because the whole dialogue would have been captured and tagged in a searchable activity stream. The marketing team might even build a new campaign that focused on how they understand their customers better and the ease of parental bookings.

 

A Social Brand engages in meaningful dialogue with it’s customers, a Social Business engages a motivated workforce to fix problems or to exploit new opportunities. Finally Social Analytics keep the whole process informed with timely and relevant information so that the focus is on the right customers and products and that effective, insightful and informed decisions are made.