Information Curation 1 dot 2

The big, fat and very cool Kabocha
At the end of a jetty on a beach near Benesse House, the big, fat and very cool Kabocha

Dot to Dot: In the previous Post

In part one we examined how the curatorial process is one that is relevant to the way in which businesses make informed decisions. We examined how Frances Morris, curator of the Kusama exhibition at the Tate Modern in 2012, dealt with abundance the most pressing issue for those of us dealing with exponentially increasing data volumes today. We also saw that curation has parallels with analysis. One that starts with very few assumptions, perhaps an inkling that there is a story to tell, but then becomes more focused as evidence is sifted, examined and understood.

In this, second part, we look at filtering, relevance and how the curatorial process helps us understand which comes first … data or information.

Relevance not Completeness

As I listened Morris at the Tate, it was clear that the story she wanted to tell was as much a product of the things she left out as it was the things she included. Morris described  how she visited a site on the Japanese island, Naoshima, to see an example of Kusama’s famous pumpkins. Perched at the of end of a pier, jutting into the Inland Sea, she decided that to take it out of context would be to lose something of the truth. This lead to, perhaps, her most controversial decision amongst Kusama’s many fans, to not include one of Kusama’s recurring themes in the summer exhibition. The pumpkins, similarly to the most frequently used data, were popular. They were well known and well understood. However, they didn’t bring anything new. At the end of the pier, they were relevant and contextual. In an exhibition intended to deliver insight into ‘Kusama’s era’s’, the key points at which the artist had reinvented herself they added nothing new.

Story First

One of the most telling characteristic of Morris’s curatorial process was that the story she wanted to tell was not limited by the art. Kusama was a leader in the 60’s New York avante garde movement. She was outlandish and outspoken, sometimes shocking. Not all of this is obvious from her art but it was an important thread in Morris’s story. To remedy this she chose to exhibit documents and papers that gave Kusama a voice. Clippings, letters and personal artefacts enriched the story. The result was a much more complete picture of an artist who’s influence on culture and society had as much to do with her activism, performance art and outrageous ‘happenings’ as her art.

Sometimes, as analysts, we limit our story by what is in the database or data warehouse. Smart decisions should be informed but that doesn’t mean to the exclusion of other forms of knowledge. That which is anecdotal and tacit alongside the ‘facts’ might provide a more complete and accurate picture. Information exists outside of columns and rows.

Joining the Dots

Does the curatorial process deliver insight? Does it ultimately leave it’s visitors with the “facts” insofar as we can as they relate to life and art. The test would be Kusama’s reaction to Morris’s exhibition when she visited for a private viewing before it was opened to the public. It seems the answer is an overwhelming yes. At one point, as Morris walked Kusama around the exhibition, she wept. The collection which spanned nine decades of an extraordinary life had struck a deep and personal chord. This visceral reaction was an acknowledgement that it was an essential truth from perhaps the only one who knew, in this case, what the truth really was.

Knowledge does not leap off a computer screen or printed page any more than the life of an artist leaps off a gallery wall. It is a synthesis of data and information. To deliver a report, chart or scorecard is not to deliver knowledge. The job is only part done. The information needs to be socialised, discussed, debated and supplemented with what we know of our customers and products.  Neither is the process just ‘analysis’. It is one of selecting that which is relevant, excluding that which is not and enriching with the experiences and opinions of those in the business who’s expertise is not captured in rows and columns. In a world where we are overwhelmed with information, knowledge and understanding requires curation.

The nine decades of Yayoi Kusama at the Tate. 

Frances Morris discusses and explores Yayoi Kusama’s life and work. Taking the audience through her curatorial processes, Morris will map out the exhibition from its origins to completion. The curator will also reflect on her personal journey with Kusama, having had the opportunity to work closely with her over the last three years.

Advertisement

Information Curation: 1 dot 1

Connecting the Dots

kusama3_bodyOn an uncharacteristically warm Summer evening in 2012 I made my way into the Tate Modern as everyone else was making their way out. It was part of my work to understand the curatorial process and its relevance to information management through one of the Tate’s infrequent but excellent curator talks. This one, from Frances Morris, concerned the recent and enormously popular Kusama exhibition.

 

The notion that curation is an emerging skill in dealing with information is not a new one. It is covered by Jeff Jarvis in his blog post ‘Death of the Curator. Long Live the Curator’ where Jarvis applies them to the field of journalism. It is also the subject of Steven Rosenbaum’s excellent book ‘Curation Nation’ which examines the meme more broadly.

 

Abundance

Japanese artist Yayoi Kusama is prolific. Her work span the many decades of her life, first in rural Japan then New York in the 60’s and in contemporary Tokyo today. It is enormously varied. Her signature style of repeating dot patterns, whilst the most famous, represents only a small part of a vast and sprawling body of work. It is the perfect artistic allegory for information overload. Kusama has too much art for any one exhibition in the same way that information professionals in the age of Big Data have too much information for any one decision.

 

Morris, I figured, must have wrestled with Kusama’s prodigious nature. The problem is not one of assembling a coherent and factual account. Instead, it is one of separating out that which is relevant and that which is extraneous. It is a process of  building a series of working hypotheses and building a story that is a reality, that is a ‘truth’.

 

Analysis and Curation

Like many managers, Morris had a vague sense of the story she wanted to tell but the final story could only be told through material facts, works or ‘data’.  At first, she considered, selected, dissected and parsed as much as possible. Over time Morris selected works through more detailed  research. She travelled extensively spending time with Kusama herself in a psychiatric institution which has (voluntarily) been Kusama’s home since 1977. She also visited locations important to Kusama including her family home and museums in Matsumoto, Chiba and Wellington, New Zealand where others had curated and exhibited her work. This parallels the analytical process. One of  starting with very few, if any assumptions, and embarking a journey of discovery. Over time, through an examination of historical and contemporary data points, the story begins to unfold.

 

In the Next Post (1 dot 2)

Already we can see that curating is a process of research and selection. It has strong parallel’s with early stages of information analysis. In the next post we will look at filtering, relevance and how the curatorial process helps us understand which comes first … data or information.

Decision Making problems are not new, in fact they are centuries old

Not Frank BuytendijkFrank Buytendijk delivered a great keynote at 8am in Las Vegas at the TDWI conference in February 2012. He avoided the technicalities of data architectures, the rigours of  data modelling and the disciplines of agile methods.

 

Instead, over breakfast, he dipped into the world of philosophy and asked us to consider the centuries old problems of what is true? what is real? and what is good?

 

Referring to Plato, Thales and Machiavelli Buytendijk lead us through some fundamentals about decision making.

What is True?

Firstly decisions are not just about the data. Do we decide to pay for parking because we calculate the cost of a ticket against the cost of a fine but factored by the risk of getting a fine? Or do we do it because we think it is the ‘right’ thing to do, the ‘civic’ thing to do?

 

What is Real?

So often, even with all the dashboards, scorecards, reports and charts, senior executives don’t seem to know what’s going on. Like in Plato’s Cave, the shadows on the wall are not reality, they are representations of reality. How much could really be told by listening to our customers directly rather than waiting for analysis much later?

 

What is Good?

Predictive analytics can provide great information that allow micro-segmentation. For example it could help an insurance company to identify those most likely to claim on their insurance policy for back and neck strain based on their on-line behaviours. Increasing their premiums might protect the business from additional costs but  the insurance business model is about distributing the risk not identifying it perfectly. Taken to it’s conclusion then there is no need for insurance, we all pay for the cost of our health care as and when it happens. However, if the insurance company used this information to promote lifestyle changes for this group then ethics and business models are aligned.

 

What’s it all about?

Buytendijk’s quirky, thought provoking start to the TDWI conference tells us that in IT, we  are wrestling with problems that preoccupied philosophers centuries ago. It also tells us though that in IT we can think too much and reflect too little.

Big Data Analytics: Size is not important

There was a time when databases came in desktop, departmental and  enterprise sizes.  There was nothing larger than ‘enterprise’ and very few enterprises needed databases that scaled to what was the largest imaginable unit of data, the terabyte. They even named a database after it.

We now live in the world of the networked enterprise. Last year, according to the IDC, the digital universe totalled 1.2 zettabytes of data. And we are only at the beginning of the explosion which is set to grow by as much as 40 times by 2020. Massive data sets are being generated by web logs, social networks, connected devices and RFID tags. This is even before we connect our fridges (and we will) to the internet. Data volumes are growing at such a click that we needed a new term, Big Data (I know) to describe it.

What is meant by ‘big’ is highly subjective but the term is loosely used to  describe volumes of data that can not be dealt with by a conventional RDBMS running on conventional hardware. That is to say, alternative approaches to software, hardware or data architectures (Hadoop, map reduce, columnar, distributed data processing etc) are required.

Big Data is not just more of the same though.  Big Data is fundamentally different. It’s new and new data can present new opportunities. According to  the Mckinsey Global Institute the use of big data is a key way for leading companies to outperform their peers. Leading retailers, like Tesco, are already using big data to take even more market share.

This is because Big Data represents a fundamental shift from capturing transactions for analysis to capturing interactions. The source of todays analytic applications are customer purchases, product returns and  supplier purchase orders whilst Big Data captures every customer click and conversation. It can capture each and every interaction. This represents an extraordinary opportunity to capture, analyse and understand what customers really think about products and services or how they are responding to a marketing campaign as the campaign is running.

Deriving analytics from big data, from content, unstructured data and natural language conversations requires a  new approach. In spite of the name though, it’s less about the size and more about the structure (or absence of structure) and level at which organisations can now understand their businesses and their customers.

BI and Poor Decision making

Good Decision/Bad Decision

This has been something of a preoccupation for me of late. We spend much of our time debating the technologies. We invest valuable time in deciding if we should we go with mega-vendors (IBM, Oracle, SAP) or a challenger? We agonise over should it be cloud or on-premises, mart or warehouse, dimensional or relational? And it is all, frankly academic if the businesses is not making good decisions.

There is no shortage of material that try and make sense of why good people and great businesses make monumentally bad decisions. In the book ‘Thing Again:Why Good Leaders Make Bad Decisions’ by Sydney Finkelstein, Jo Whitehead and Andrew Cambell the focus is on the strategic decisions that have dramatic and highly visible consequences for the organisation.

Good People in Great Organisations Can Make Poor Decisions

An example is one of the UK’s premier retailers Boots which enjoys one of the largest footfalls in the UK. Established in the 19th century, it is now a subsidiary of £20billion Alliance Boots. In September 1998, the Chief Executive, Steve Russell excitedly announced a range of healthcare offerings including dentistry, chiropody and laser hair removal. Five years later, the initiative had lost in the region of £100m and Boots needed to break open the piggy bank and look down the back of the sofa for another £50m just to close down the operation and convert that premier retail space back to being … retail. It almost goes without saying that the changes were implemented by a new CEO, Richard Baker.

Apparently, one of the chief reasons for making the move into Healthcare services was  that a slowdown in the Beauty business ‘had been detected’. However a spokesman was later quoted in the Telegraph as saying that ‘they recognised that these areas are still growing strongly’.

Let’s stop there for a second. Spotting trends in sales and revenue by product category is probably marketing and business 101. And even the most rudimentary business intelligence solution should be trending sales over time. Yet the trend in sales in a key category for Boots was diagnosed as slowdown and only a few months later as growth. Of course, the slowdown may have been a short-term blip but the point of trending is to smooth these out for the purpose of longer-term planning. And, the error in trending might be more understandable had it not been for the fact that the later growth was characterised as ‘strong’.

Of course, I am not on the board of Boots and I have an advantage shared with all those analysts and commentator that put the boot (or should that be Boots) into Mr Russell … hindsight. Indeed, it’s a testimony to the strength of Boots as a high street giant that they can make major booboo’s and still go on to survive and thrive.

The Problem with Decisions …

And organisations are complex systems of individuals and interactions. Large organisations are very complex. This is why organisational decision making doesn’t always stand up to the scrutiny of us as individuals who retrospectively try and apply the logic of rational decision making to such mistakes.

There are a number of problems associated with individuals making decisions. Individuals have bias, self-interest, pre-conceptions. There are also a number of problems with organisational decisions. Groups have to manage conflict, disagreement and there are dynamics that can produce undesirable outcomes like Groupthink.

Today BI’s only Contribution is a Report, Chart or Dashboard

So if we accept that the purpose of Business Intelligence is to help organisations make better decisions (surely there is no debate here?) then Business Intelligence applications have to be more than reports, dashboards and charts.

They need to make decisions easier to collaborate around, they need to link decisions directly to the information that is required to make them. Furthermore decisions need to be open, transparent, accountable not just for the regulators but so that the whole organisation can buy into them.

Decision Making Black Holes

 

A Funny Thing Happens at the Forum

Meetings are one of the most common decision making ‘forums’ we are all regularly involved in. In fact one in five company meetings we take is to make a decision. As a way of making decisions though, they can be problematic. Once the meeting has concluded, the connection between information shared, decisions made and actions taken can be weak even lost. It’s as if the meeting itself were a decision making black hole.

Some Decisions are More Equal Than Others

Some decision making meetings are impromptu for making a timely, tactical decision quickly. Others are regular, formal and arranged around the ‘drum beat’ or ‘cadence’ of a business to make more strategic decisions. The more strategic the decisions and longer term the impact the less frequent the forum so a Senior or Executive Management Team may only meet quarterly for a business review (QBR)

How a QBR ‘Rolls’

A typical QBR will see Senior Managers sharing results in PowerPoint, possibly with financial results in spread-sheets which I would hope have at least been extracted from a Business Intelligence application.

If the SMT are reasonably well organised, they will summarise their conclusions and actions in meeting minutes. The meeting minutes will be typed up by an assistant in a word document and then distributed in email.

Throughout, they will all have been keeping individual notes so will walk out with these in their daybooks. The most senior manager in the room might not do this particularly if it’s their assistant who’s taking the minutes.

Later, actions from daybooks and minutes are likely transferred to individuals to-do lists and all follow-up will be conducted in email and phone calls.

An Implosion of Information, Conclusion and Decision

So let’s recap. Critical decisions about how resources are going to be allocated will be discussed in a ‘QBR’ and yet the artefacts of this critical decision making forum are scattered into Word documents, excel spread-sheets, emails and outlook tasks. Tiny fragments of the discussion, information, conclusion, decisions and activities implode around the organisation. To be frank, the team are now only going to make progress because the forum was recent and can be relatively easily recalled.

Of course, once time or people move on so does the corporate memory of the decision. Conversations begin with ‘what did we agree to do about that cost over-run?’ or ‘why did we say we were ok with the revenue performance in Q1?’

Executive Attention Deficit Syndrome

Many executives complain of a syndrome that feels like ADS. This is because the more senior the manager the more things they will probably have to deal with at an increasingly superficial level. A functional head will probably spend no more than 15 minutes on any one thing. To productively make decisions they will need to be able to have the background, status and related information to hand so that they can deal with it quickly and move on to the next thing. Decision making black holes contribute to this feeling of EADS.

CDM and Corporate Memory

Corporate Decision Making platforms will be successful when they connect;

  • Decisions
  • Information on which the decision was made
  • Insight derived from the information
  • Actions taken on the decision
  • Results of the actions

This means total recall of corporate decisions good and bad so that, over time, decisions can be recalled, evaluated, re-used or improved. A far cry from current decision making forums which whilst functional are inherently flawed, fragmented and are not improving the timeliness and quality of decisions in our organisations.

BI Requirements Should not just be Gathered

There are many resources remonstrating with the IT community on the importance of gathering requirements. Failing to gather requirements, they warn, will lead to a poor solution delivered late and over budget. This is largely inarguable.

However, I would warn that simply ‘gathering’ requirement is as big a risk. Fred Brooks, author of ‘The Mytical Man Month’ once said that ‘the hardest part of building a software system is deciding what to build’. And deciding what to build is a two way process rather than the act of listening, nodding and documenting that we all too often see in Business Intelligence projects.

From time to time, I hear someone cry foul on this assertion. They argue that it seems like the tail is wagging the dog or that the business cannot compromise on the requirement. I usually point out that simply building what the user asked for doesn’t happen in any other field of engineering. Architects advise on the cost of materials when planning a major new office building, City officials take advice on the best possible location for a bridge and environmental consultants are actively engaged in deciding exactly if and what should be built in any major civil engineering project.

And this is exactly how we should approach business analytic requirements. As a two-way exploration of what is required, possible solutions and the implications of each. Incidentally, this is particularly difficult to do if business users are asked to gather and document their own requirements without input from their implementation team.

An example of why this is important is rooted in the fact that many BI technologies (including IBM Cognos) are tools not programming languages. They have been built around a model to increase productivity. That is, if you understand and work with the assumptions behind the model reports, dashboards and other BI application objects can be built very quickly. Bend the model and development times increase. Attempting to work completely around the model may result in greatly reduced productivity and therefore vastly increased development time.

So be wary of treating ‘gathering’ and ‘analysis’ as distinct and separate steps. Instead, the process should be an iterative collaboration between users and engineers. Requirements should be understood but so should the implications from a systems perspective. The resulting solution will almost undoubtedly be a better fit and it will significantly increase the chance of it being delivered on time, at the right cost and with an increased understanding between those that need the systems and those that build them.

"We fail more often because we solve the wrong problem than because we get the wrong solution to the right problem", Russell Ackoff, 1974

Single version of the truth, philosophy or reality?

Assuming you want the truth and you can handle it then you will have heard this a lot. The purpose of our new (BI/Analytics/Data Warehouse) project is to deliver ‘a single version of the truth’. In a project we are engaged with right now the expression is one version of reality or 1VOR. For UK boomers that will almost undoubtedly bring to mind a steam engine but I digress.

I have to admit, I find the term jarring whenever I hear it because it implies something simple and  attainable through a single system which is rarely the reality.

In fact it’s rarely attained causing some of our community to ponder on it’s viability or even if it exists. Robin Bloor’s ‘Is there a single version of the Truth’ and  Beyond a single version of the truth in the Obsessive Compulsive Data Quality blog are great examples.

Much, on this subject, has been written by data quality practitioners and speaks to master data management and the desire, for example, for a single and consistent view of a customer. Banks often don’t understand customers, they understand accounts and if the number of (err, for example Hotel Chocolat) home shopping brochures I receive is anything to go by then many retailers don’t get it either. Personally I want my bank and my chocolatier to know when I am interacting with them. I’m a name, not a number, particularly when it comes to chocolate.

This problem is also characterised by the tired and exasperated tone of a Senior Manager asking for (and sometimes begging for) a single version of the truth. This is usually because they had a ‘number’ (probably revenue) and went to speak to one of their Department Head about it (probably because it was unexpectedly low) and rather than spending time on understanding what the number means or what the business should do, they spent 45 minutes comparing the Senior Managers ‘number’ with the Department Heads ‘number’. In trying to reconcile them, they also find some more ‘numbers’ too. It probably passed the time nicely. Make this a monthly meeting or a QBR involving a number of department heads and the 45 minutes will stretch into hours without any real insight from which decisions might have been made.

This is partly about provenance. Ideally it came from a single system of record (Finance, HR) or corporate BI but it most likely came from a spreadsheet or even worse a presentation with a spreadsheet embedded in it.

It’s also about purity (or the addition of impurities, at least) It might have started pure but the department head or an analyst that works in their support and admin team calculated the number based on an extract from the finance system and possibly some other spreadsheets. The numbers were probably adjusted because of some departmental nuance. For example, if it’s a Sales Team, the Sales Manager might include all the sales for a rep that joined part way through the year whilst Finance left the revenue with the previous team.

It will be no comfort (or surprise) to our Senior Manager that it is also a Master Data Management problem too. Revenue by product can only make sense if everyone in the organisation can agree the brands, categories and products that classify the things that are sold. Superficially this sounds simple but even this week I have spoken with a global business that is launching a major initiative, involving hundreds of man hours to resolve just this issue.

It’s also about terminology. We sacrifice precision in language for efficiency. In most organistions we dance dangerously around synonyms and homonyms because it mostly doesn’t catch us out. Net revenue … net of what? And whilst we are on the subject … revenue. Revenue as it was ordered, as it was delivered, as it was invoiced and as it is recognised according to GAAP rules in the finance system. By the way does your number include credit notes? And this is a SIMPLE example. Costs are often centralised, allocated or shared in some way and all dependent on a set of rules that only a handful of people in the finance team really understand.

Finally, it’s about perspective. Departments in an organisation often talk about the same things but mean subtly different things because they have different perspectives. The sales team mean ordered revenue because once someone has signed hard (three copies) their job is done whilst the SMT are probably concerned about the revenue that they share with the markets in their statutory accounts.

So is a single version of the truth philisophy? Can it really be achieved? The answer is probably that there are multiple versions of the truth but they are, in many organisations, all wrong. Many organisations are looking at different things with differing perspectives and they are ALL inaccurate.

A high performing organisations should be trying to unpick these knots, in priority order, one at a time. Eventually they will be able to look at multiple versions of the truth and understand their business from multiple perspectives. Indeed the differences between the truth’s will probably tell them something they didn’t know from what they used to call ‘the single version of the truth’.

More and more choices for BI Solution Architects

We analytics practitioners have always had the luxury of alternatives to the RDBM as part of our data architectural choices. OLAP of one form or another has been providing what one of my colleagues calls ‘query at the speed of thought’ for well over a decade. However, the range of options available to a solutions architect today is bordering on overwhelming.

First off, the good old RDBMS offers hashing, materialised views, bitmap indexes and other physical implementation options that don’t really require us to think too differently about the raw SQL. The columnar database and implementations of it in products like Sybase IQ are another option. The benefits are not necessarily obvious. We data geeks always used to think the performance issues where about joining but then the smart people at InfoBright, Kickfire et al told us that shorter rows are the answer to really fast queries on large data volumes. There is some sense in this given that disk i/o is an absolute bottleneck so less columns means less redundant data reading. The Oracle and Microsoft hats are in the columnar ring (if you will excuse the garbled geometry and mixed metaphor) with Exadata 2 and Gemini/Vertipaq so they are becoming mainstream options.


Data Warehouse appliances are yet another option. The combined hardware, operating systems and software solution usually using massively parallel (MPP) deliver high performance on really large volumes. And by large we probably mean Peta not Tera. Sorry NCR, Tera just doesn’t impress anyone anymore. And whilst we are on the subject of Teradata, it was probably one of the first appliances but then NCR strategically decided to go open shortly before the data warehouse appliance market really opened up. The recent IBM acquisition of Netezza and the presence of Oracle and NCR is reshaping what was once considered niche and special into the mainstream.


We have established that the absolute bottleneck is disk i/o so in memory options should be a serious consideration. There are  in-memory BI products but the action is really where the data is.Databases include TimesTen (now Oracle’s) and IBM’s solidDB. Of course, TM1 fans will point out that they had in-memory OLAP when they were listening to Duran Duran CD’s and they would be right.

The cloud has to get a mention here because it is changing everything. We can’t ignore those databases that have grown out of the need for massive data volumes like Google’s BigTable, Amazon’s RDS and Hadoop. They might not have been built with analytics in mind but they are offering ways of dealing with unstructured and semi-structured data and this is becoming increasingly important as organisations include data from on-line editorial and social media sources in their analytics. All of that being said, large volumes and limited pipes are keeping many on-premises for now.

So, what’s the solution? Well that is the job of the Solutions Architect. I am not sidestepping the question (well actually, I am a little) However, it’s time to examine the options and identify what information management technologies should form part of your data architecture. It it is no longer enough to simply chose an RDBMS.