

The genesis of successful data science is integrating data from diverse sources. Most businesses, by their own admission, are not maximising the potential of their data. According to Accenture, only 25 percent of companies habitually rely on analytics in their decision-making process, even though 62 percent believe that analytics make for “quicker and more effective decision-making.”
Data integration is often one of the most significant challenges organisations face when developing advanced analytics capability. Even companies that do not consciously think of themselves as data-oriented will frequently capture unstructured data from many different sources, including emails, chatbots, social media, phone call recordings, user feedback forms, and many others. This means that even before the advent of ‘big data’, many businesses were building up considerable stores of business-critical information.
This unstructured data, however, is frequently neglected or under-utilised, despite its potential to transform business performance. It is seen as large, difficult to work with using standard analytical toolsets and often of uncertain quality. Before analysis can start, the pre-processing involved in using unstructured data can be enough to put off or intimidate many analysts. However, pulling together all these diverse sources, in order to create a single, unified view of a company and its customers, is critical in building a data-driven organisation.
Major unstructured data sources can include:
Clickstream data – the transactional and highly sequential information generated by website interactions can provide substantial intelligence on customer characteristics from the online customer journey. For example, in the retail sector, is there a pattern of people adding an item to their cart, then abandoning their shop? Understanding the sequence of events that led to this action can help inform an appropriate response.
Free text data – chatbots, emails, social media and customer reviews are all good sources of textual data. This data, however, keyed in by an individual, can use any number of different words, symbols, context and abbreviations. This can make it difficult for a computer to extract meaningful information.
Media (audio, image and video) data – audio streams (such as recorded phone calls) can often provide a key source of insight into customer complaints and common problems with your product. Video data can be taken from sources such as drones or CCTV footage and can be invaluable for numerous tasks such as managing security in physical premises. Deep learning techniques can uncover complex patterns in large amounts of media data, for applications such as image and speech recognition.
Sensors and Internet of Things (IoT) data – this is a large and fast-growing category that encompasses data from sources including cars, mobile phones, in-home devices and wearable technology.
A large part of the transformational aspect of data science results from layering these data sources on top of each other. Combining traditional structured information with customer clickstreams, social media, emails, call recordings, and IoT can provide a formidable source of intelligence on which potential customers will be most attracted to your product or service, when and how they will buy, how best to engage them, and how to avoid disappointing them.