THE FIRST VIEW:
Published: November 3, 2023
The use of data to drive and augment decision-making has been with us a long time but this new paradigm of big data is delivering huge quantities of data with far greater granularity, far greater frequency and across far more factors. It can make a huge difference to any business that can learn to use this flow of data to optimise costs, spot market opportunities, price dynamically or react faster than a competitor.
The trick is getting to that point as the pathway contains some separate challenges all with different resolutions. Firstly, you need to collect your data, if you are dealing with IOT devices this has one set of challenges, if you are dealing with processed business data there is another. Of course, you will also want to enhance all of that with structured and or unstructured data from related events such as the weather, news, social media or price data. All of this will require a time series database as the timeline is important.
Processed business data has its own challenges, it needs to be clean to be useful and you really need to know the sources or the data lineage. In complex organisations, data is often taken and processed into an output that then may further be aggregated. Understanding what data you are working with is critical.
The next big challenge is what to do with it and how to make sense of it. This is where data science comes in, the data sets and their relationships need to be analysed and a data model constructed. The key indicators in the data are often buried within interrelated triggers. For example: pump A is running hot but by itself it’s not a significant event but if the oil pressure is low and other factors are present then it might indicate imminent catastrophic failure. The data scientist team finds these relationships and continuously refines them over time. This is not a one-off event, you can do enough to get you going but to derive greater value you need to divert the budget to ongoing data science.
Once you know the factors you are looking for, you need the data science team to build the algorithms to constantly monitor the data flows in real-time but what happens when you hit a trigger? Do you want an automated response or alerts to key operatives? If you are going to alert someone, how are you going to do it? It needs to be effective and timely and bearing in mind that this data might be being delivered to someone at a desk but equally someone out in the field wearing a hard hat and gloves. Making this delivery or decisioning mechanism effective is very important.
As indicated, we need continuous feedback loops to refine and improve the data science and algorithms. Going back to the pump example, the scenario might be entirely acceptable if the pump is performing a particular task within a particular time frame, that kind of contextual narrative helps to define better detection, the reduction of false positives but to build it into the algorithm will require more data sources.
To make all of this real, the example with the pump is not fictitious, a few years ago we built pump monitoring analysis for a client. The project had the good fortune of paying for itself within two months as it predicted potential major plant failure that was caught in time and brought back online quickly with simple routine maintenance. The plant engineers acknowledged that the equipment would have failed spectacularly had it been allowed to continue.
All of this also opens up AI, without the right data sources, modelling and algorithms then the ability to apply AI is severely limited. Big data feels a little like the early days of the technology boom, we gradually had to dedicate more resources to it but the value derived was huge.