The couple in Ira Gershwin’s song Lets call the whole thing off lamented the way they pronounced the same words differently because it exposed class differences which might eventually be their undoing. Human communication is a funny thing. If Fred Astaire and Ginger Rogers had met on Facebook then regardless of how they pronounced neither, either and tomato, they would have assumed that they, like the spelling, were a perfect match.
Understanding nuance in human communication is a preoccupation for those of us building social media analytic applications and specifically as it applies to the Social Listening process. Social listening is the data collection process in a social media analytics application, the point at which the vast sea of blog, editorial and social media content is collected and converted into usable analysis. The purpose of Social Listening is to collect and filter ‘mentions’, instances of the company, brand, product or marketing campaign being referenced in an item of online content. Most platforms are good at collecting mentions but many fail in their level of accuracy, not because of scale and volume but because they don’t understand the human capacity for saying the same thing in so many different ways.
Fred and Ginger were both speaking (American) English and yet still had problems because language is only one of the many considerations when we try to understand the written word. Slang, regional idioms and differences in style relating to social groupings, profession, generation and gender are just a few others.
Anyone with teenage children can tell you about generational language differences. At one time my Son and his friends frequently used the expression ‘you just got pwned’ or ‘he pwned me’ usually but not exclusively when gaming. It describes the process of being decisively and unambiguously beaten by a competitor. ‘Pwned’ is a corruption of ‘owned’ attributed to a mis-spelling by a world of warcraft map designer and for some reason it fell into common usage. Unlike much of what we deal with in information systems, there is no rule, no derivation, it is simply something which is known. Without this knowledge what would a social media monitoring platform make of the tweet ‘coke pwns pepsi’ (or the other way around, of course)?
Other differences are equally obtuse. Take emoticons. Baby boomers rarely use them, gen-X ers commonly use them and gen Y-ers use them but differently. A gen-X er is more likely to use 🙂 and a gen -Y er 🙂 Very little difference to the human eye but in traditional text filters they simply don’t match.
Many are a little surprised when I point out that the author’s gender makes a difference to the language used. Of course, women might be more likely to discuss hormone replacement therapies and men more likely to discuss male pattern baldness if they are blogging about their mid-life crisis but given a gender-neutral topic, men and women still use different language. One website, gender genie, can identify the gender of the author of a piece of text with a surprisingly high degree of accuracy.
What does all of this mean? It means that Social Media Analytics platforms have to understand the rich, inconsistent and unfathomable ways in which we all converse. To get more specific and technical, social listening must employ linguistic variant sets to accurately disambiguate language variations. Simply put, they must be able to handle a set of alternative way of saying the same thing. Social listening must be inclusive of all diversity regardless of age, gender, ethnicity, social status, profession and yes, sexuality before they can capture data suitable for the purpose of analytics. Otherwise, you might as well just call the whole thing off.
Also reproduced for IBM Vision for the IT expert community.