Decision Making problems are not new, in fact they are centuries old

Not Frank BuytendijkFrank Buytendijk delivered a great keynote at 8am in Las Vegas at the TDWI conference in February 2012. He avoided the technicalities of data architectures, the rigours of  data modelling and the disciplines of agile methods.

 

Instead, over breakfast, he dipped into the world of philosophy and asked us to consider the centuries old problems of what is true? what is real? and what is good?

 

Referring to Plato, Thales and Machiavelli Buytendijk lead us through some fundamentals about decision making.

What is True?

Firstly decisions are not just about the data. Do we decide to pay for parking because we calculate the cost of a ticket against the cost of a fine but factored by the risk of getting a fine? Or do we do it because we think it is the ‘right’ thing to do, the ‘civic’ thing to do?

 

What is Real?

So often, even with all the dashboards, scorecards, reports and charts, senior executives don’t seem to know what’s going on. Like in Plato’s Cave, the shadows on the wall are not reality, they are representations of reality. How much could really be told by listening to our customers directly rather than waiting for analysis much later?

 

What is Good?

Predictive analytics can provide great information that allow micro-segmentation. For example it could help an insurance company to identify those most likely to claim on their insurance policy for back and neck strain based on their on-line behaviours. Increasing their premiums might protect the business from additional costs but  the insurance business model is about distributing the risk not identifying it perfectly. Taken to it’s conclusion then there is no need for insurance, we all pay for the cost of our health care as and when it happens. However, if the insurance company used this information to promote lifestyle changes for this group then ethics and business models are aligned.

 

What’s it all about?

Buytendijk’s quirky, thought provoking start to the TDWI conference tells us that in IT, we  are wrestling with problems that preoccupied philosophers centuries ago. It also tells us though that in IT we can think too much and reflect too little.

Advertisements

Big Data Analytics: Size is not important

There was a time when databases came in desktop, departmental and  enterprise sizes.  There was nothing larger than ‘enterprise’ and very few enterprises needed databases that scaled to what was the largest imaginable unit of data, the terabyte. They even named a database after it.

We now live in the world of the networked enterprise. Last year, according to the IDC, the digital universe totalled 1.2 zettabytes of data. And we are only at the beginning of the explosion which is set to grow by as much as 40 times by 2020. Massive data sets are being generated by web logs, social networks, connected devices and RFID tags. This is even before we connect our fridges (and we will) to the internet. Data volumes are growing at such a click that we needed a new term, Big Data (I know) to describe it.

What is meant by ‘big’ is highly subjective but the term is loosely used to  describe volumes of data that can not be dealt with by a conventional RDBMS running on conventional hardware. That is to say, alternative approaches to software, hardware or data architectures (Hadoop, map reduce, columnar, distributed data processing etc) are required.

Big Data is not just more of the same though.  Big Data is fundamentally different. It’s new and new data can present new opportunities. According to  the Mckinsey Global Institute the use of big data is a key way for leading companies to outperform their peers. Leading retailers, like Tesco, are already using big data to take even more market share.

This is because Big Data represents a fundamental shift from capturing transactions for analysis to capturing interactions. The source of todays analytic applications are customer purchases, product returns and  supplier purchase orders whilst Big Data captures every customer click and conversation. It can capture each and every interaction. This represents an extraordinary opportunity to capture, analyse and understand what customers really think about products and services or how they are responding to a marketing campaign as the campaign is running.

Deriving analytics from big data, from content, unstructured data and natural language conversations requires a  new approach. In spite of the name though, it’s less about the size and more about the structure (or absence of structure) and level at which organisations can now understand their businesses and their customers.

BI and Poor Decision making

Good Decision/Bad Decision

This has been something of a preoccupation for me of late. We spend much of our time debating the technologies. We invest valuable time in deciding if we should we go with mega-vendors (IBM, Oracle, SAP) or a challenger? We agonise over should it be cloud or on-premises, mart or warehouse, dimensional or relational? And it is all, frankly academic if the businesses is not making good decisions.

There is no shortage of material that try and make sense of why good people and great businesses make monumentally bad decisions. In the book ‘Thing Again:Why Good Leaders Make Bad Decisions’ by Sydney Finkelstein, Jo Whitehead and Andrew Cambell the focus is on the strategic decisions that have dramatic and highly visible consequences for the organisation.

Good People in Great Organisations Can Make Poor Decisions

An example is one of the UK’s premier retailers Boots which enjoys one of the largest footfalls in the UK. Established in the 19th century, it is now a subsidiary of £20billion Alliance Boots. In September 1998, the Chief Executive, Steve Russell excitedly announced a range of healthcare offerings including dentistry, chiropody and laser hair removal. Five years later, the initiative had lost in the region of £100m and Boots needed to break open the piggy bank and look down the back of the sofa for another £50m just to close down the operation and convert that premier retail space back to being … retail. It almost goes without saying that the changes were implemented by a new CEO, Richard Baker.

Apparently, one of the chief reasons for making the move into Healthcare services was  that a slowdown in the Beauty business ‘had been detected’. However a spokesman was later quoted in the Telegraph as saying that ‘they recognised that these areas are still growing strongly’.

Let’s stop there for a second. Spotting trends in sales and revenue by product category is probably marketing and business 101. And even the most rudimentary business intelligence solution should be trending sales over time. Yet the trend in sales in a key category for Boots was diagnosed as slowdown and only a few months later as growth. Of course, the slowdown may have been a short-term blip but the point of trending is to smooth these out for the purpose of longer-term planning. And, the error in trending might be more understandable had it not been for the fact that the later growth was characterised as ‘strong’.

Of course, I am not on the board of Boots and I have an advantage shared with all those analysts and commentator that put the boot (or should that be Boots) into Mr Russell … hindsight. Indeed, it’s a testimony to the strength of Boots as a high street giant that they can make major booboo’s and still go on to survive and thrive.

The Problem with Decisions …

And organisations are complex systems of individuals and interactions. Large organisations are very complex. This is why organisational decision making doesn’t always stand up to the scrutiny of us as individuals who retrospectively try and apply the logic of rational decision making to such mistakes.

There are a number of problems associated with individuals making decisions. Individuals have bias, self-interest, pre-conceptions. There are also a number of problems with organisational decisions. Groups have to manage conflict, disagreement and there are dynamics that can produce undesirable outcomes like Groupthink.

Today BI’s only Contribution is a Report, Chart or Dashboard

So if we accept that the purpose of Business Intelligence is to help organisations make better decisions (surely there is no debate here?) then Business Intelligence applications have to be more than reports, dashboards and charts.

They need to make decisions easier to collaborate around, they need to link decisions directly to the information that is required to make them. Furthermore decisions need to be open, transparent, accountable not just for the regulators but so that the whole organisation can buy into them.

Death of the BI Generalist

I have fond memories of the 90s. I spent a fair amount of it being a ‘Cognos Consultant’ travelling the length and breadth of the UK being an expert in all things Cognos. There was a great sense of independence and autonomy. It was possible to pitch up at a new Client with 2 CDs (One for PowerPlay, one for Impromptu) and install the software on a handful of PCs on Monday morning and by Friday I would have interviewed for requirements, built a metadata model, cubes and reports and would be huddled around a desktop with the users putting the final touches to what I dared to call a BI application.

It was the day of the renaissance BI Consultant, the Polymath. We all felt like experts from the time of we opened the jewel case to the point the user signed off their brand new Business Intelligence application.

As much as I enjoyed this, I don’t yearn after these as simpler times. The solutions we built were great and delivered real and incremental value. They didn’t, however, have the breadth, depth and coverage as the best implementations today. These are deployed over the web to hundreds, sometimes thousands of users covering multiple business domains. Modern performance management solutions have bought together Sales, Marketing, Operations, Finance and the Senior Management Team together in ways that were less achievable perhaps even impossible ten years ago.

There is a cost though. The range of skills required to deploy a modern BI (or Business Analytics) Application is broad. Take the individual that installs the software. What used to require a working knowledge of Windows and an installation manual now also needs an understanding of web servers, operating systems, virtualisation, application servers and security all in multiple flavours. It may also require systems management conversations to configure for fail-over and load-balancing in a way that wasn’t required before because it wasn’t possible. And whilst the internet means we are all effectively sharing a single network it stands to reason that it is important to understand all the different ways of communicating and processing on this single, global network.

So far, we have only unwrapped the cellophane but things get really interesting when we get our hands on data. The possibilities for delivering value from the vast array of corporate data assets as improved dramatically but this means that they need to be integrated too. This required an in-depth understanding of relational and dimensional modelling techniques, databases sql and at least one ETL tool. Whilst there are some tools that simplify this process with tools and (usually) OLAP cubes, they are typically departmental in scale. Modern solutions may also required a mix of on-premises and cloud data and a mix of  structured, and unstructured which increases the richness of the solution but also the number of things a developer needs to keep in their head.

So where does this leave us? If you have a team of homogeneous ‘BI Developers’ then their skills may be too general. I am working with one client in an advisory capacity at the moment and they have one consultant that has implemented everything. He has been very successful too. However, their business, though, is of a size where he can wrap his arms around the requirement, the infrastructure and the data. He also has exceptional aptitude and is an outstanding consultant throwing herculean effort at their projects. For the rest of us mortals we need to divvy up the responsibilities.

Many of my clients tend to do this across technical/systems, data and application. It’s not the only way but works really well. The IS team pick up the install and maintenance of the application server environment, a data team create a single reporting database/data warehouse/marts and a BI team maintain metadata and the reporting application in all it’s variants of reports, analysis, dashboards, scorecards etc.

There are interesting implications for the modern, complex out-sourcing and off-shoring organisation. Another of my clients based in the City out-source their databases including data warehouse development. Because there is a cost each time they want to add a new data item for the purpose of BI they tended to try to resolve new requirements in metadata or the reporting application. Inevitably the quality of their solution deteriorated over time because they were not always extending or fixing in the right place. Eventually, the sticking plasters gave out and they were forced to back to first principles which had implications of cost and re-work.

The more common issue is one of a shared design. Changes to data impact ETL, metadata, reports and the application. A new requirement might need to be changed in one or all of these places. Whilst usually a number of small and straightforward changes, they do need to applied consistently. To co-ordinate activity when the solution is expanded or enhanced there needs to be a common data model. In our experience this needs to be two models. One, a logical (and relational) model that represents the business data in a perfect and integrated world. Secondly a dimensional model that represents the data as the business see it in their reporting application through reports and metadata. These are surprisingly rare in our experience and usually not because they take a long time to draw (they don’t) Our suspicion is that what makes these time consuming is the need for consensus. However, if there is no consensus then there is a risk that the solution is already flawed.

So, a BI/Analytics solution cannot be built single-handedly. It requires a range of skills and it’s difficult to be an expert in all of them. The generalist then is fading away and being replaced by a team and a shared design who can deploy solutions with greater reach and richness than when we could ever have believed a few short years ago.

Single version of the truth, philosophy or reality?

Assuming you want the truth and you can handle it then you will have heard this a lot. The purpose of our new (BI/Analytics/Data Warehouse) project is to deliver ‘a single version of the truth’. In a project we are engaged with right now the expression is one version of reality or 1VOR. For UK boomers that will almost undoubtedly bring to mind a steam engine but I digress.

I have to admit, I find the term jarring whenever I hear it because it implies something simple and  attainable through a single system which is rarely the reality.

In fact it’s rarely attained causing some of our community to ponder on it’s viability or even if it exists. Robin Bloor’s ‘Is there a single version of the Truth’ and  Beyond a single version of the truth in the Obsessive Compulsive Data Quality blog are great examples.

Much, on this subject, has been written by data quality practitioners and speaks to master data management and the desire, for example, for a single and consistent view of a customer. Banks often don’t understand customers, they understand accounts and if the number of (err, for example Hotel Chocolat) home shopping brochures I receive is anything to go by then many retailers don’t get it either. Personally I want my bank and my chocolatier to know when I am interacting with them. I’m a name, not a number, particularly when it comes to chocolate.

This problem is also characterised by the tired and exasperated tone of a Senior Manager asking for (and sometimes begging for) a single version of the truth. This is usually because they had a ‘number’ (probably revenue) and went to speak to one of their Department Head about it (probably because it was unexpectedly low) and rather than spending time on understanding what the number means or what the business should do, they spent 45 minutes comparing the Senior Managers ‘number’ with the Department Heads ‘number’. In trying to reconcile them, they also find some more ‘numbers’ too. It probably passed the time nicely. Make this a monthly meeting or a QBR involving a number of department heads and the 45 minutes will stretch into hours without any real insight from which decisions might have been made.

This is partly about provenance. Ideally it came from a single system of record (Finance, HR) or corporate BI but it most likely came from a spreadsheet or even worse a presentation with a spreadsheet embedded in it.

It’s also about purity (or the addition of impurities, at least) It might have started pure but the department head or an analyst that works in their support and admin team calculated the number based on an extract from the finance system and possibly some other spreadsheets. The numbers were probably adjusted because of some departmental nuance. For example, if it’s a Sales Team, the Sales Manager might include all the sales for a rep that joined part way through the year whilst Finance left the revenue with the previous team.

It will be no comfort (or surprise) to our Senior Manager that it is also a Master Data Management problem too. Revenue by product can only make sense if everyone in the organisation can agree the brands, categories and products that classify the things that are sold. Superficially this sounds simple but even this week I have spoken with a global business that is launching a major initiative, involving hundreds of man hours to resolve just this issue.

It’s also about terminology. We sacrifice precision in language for efficiency. In most organistions we dance dangerously around synonyms and homonyms because it mostly doesn’t catch us out. Net revenue … net of what? And whilst we are on the subject … revenue. Revenue as it was ordered, as it was delivered, as it was invoiced and as it is recognised according to GAAP rules in the finance system. By the way does your number include credit notes? And this is a SIMPLE example. Costs are often centralised, allocated or shared in some way and all dependent on a set of rules that only a handful of people in the finance team really understand.

Finally, it’s about perspective. Departments in an organisation often talk about the same things but mean subtly different things because they have different perspectives. The sales team mean ordered revenue because once someone has signed hard (three copies) their job is done whilst the SMT are probably concerned about the revenue that they share with the markets in their statutory accounts.

So is a single version of the truth philisophy? Can it really be achieved? The answer is probably that there are multiple versions of the truth but they are, in many organisations, all wrong. Many organisations are looking at different things with differing perspectives and they are ALL inaccurate.

A high performing organisations should be trying to unpick these knots, in priority order, one at a time. Eventually they will be able to look at multiple versions of the truth and understand their business from multiple perspectives. Indeed the differences between the truth’s will probably tell them something they didn’t know from what they used to call ‘the single version of the truth’.

More and more choices for BI Solution Architects

We analytics practitioners have always had the luxury of alternatives to the RDBM as part of our data architectural choices. OLAP of one form or another has been providing what one of my colleagues calls ‘query at the speed of thought’ for well over a decade. However, the range of options available to a solutions architect today is bordering on overwhelming.

First off, the good old RDBMS offers hashing, materialised views, bitmap indexes and other physical implementation options that don’t really require us to think too differently about the raw SQL. The columnar database and implementations of it in products like Sybase IQ are another option. The benefits are not necessarily obvious. We data geeks always used to think the performance issues where about joining but then the smart people at InfoBright, Kickfire et al told us that shorter rows are the answer to really fast queries on large data volumes. There is some sense in this given that disk i/o is an absolute bottleneck so less columns means less redundant data reading. The Oracle and Microsoft hats are in the columnar ring (if you will excuse the garbled geometry and mixed metaphor) with Exadata 2 and Gemini/Vertipaq so they are becoming mainstream options.


Data Warehouse appliances are yet another option. The combined hardware, operating systems and software solution usually using massively parallel (MPP) deliver high performance on really large volumes. And by large we probably mean Peta not Tera. Sorry NCR, Tera just doesn’t impress anyone anymore. And whilst we are on the subject of Teradata, it was probably one of the first appliances but then NCR strategically decided to go open shortly before the data warehouse appliance market really opened up. The recent IBM acquisition of Netezza and the presence of Oracle and NCR is reshaping what was once considered niche and special into the mainstream.


We have established that the absolute bottleneck is disk i/o so in memory options should be a serious consideration. There are  in-memory BI products but the action is really where the data is.Databases include TimesTen (now Oracle’s) and IBM’s solidDB. Of course, TM1 fans will point out that they had in-memory OLAP when they were listening to Duran Duran CD’s and they would be right.

The cloud has to get a mention here because it is changing everything. We can’t ignore those databases that have grown out of the need for massive data volumes like Google’s BigTable, Amazon’s RDS and Hadoop. They might not have been built with analytics in mind but they are offering ways of dealing with unstructured and semi-structured data and this is becoming increasingly important as organisations include data from on-line editorial and social media sources in their analytics. All of that being said, large volumes and limited pipes are keeping many on-premises for now.

So, what’s the solution? Well that is the job of the Solutions Architect. I am not sidestepping the question (well actually, I am a little) However, it’s time to examine the options and identify what information management technologies should form part of your data architecture. It it is no longer enough to simply chose an RDBMS.