Data-Centric AI at Schiphol
Traditionally, data scientists have taken the algorithm as a starting point to generate business insights. As more and better models come onto the market, there’s a growing awareness of model-driven AI’s limitations. This is giving way to an approach where data scientists work with the data to optimize a model’s performance. In this article, we take a look at such a data-centric AI project: Deep Turnaround at Schiphol.
Model-Driven versus Data-Centric AI
Better, faster decision-making, more efficient processes, and lower costs: these are the promises of artificial intelligence (AI) and a data-driven way. To arrive at these insights, data scientists have traditionally taken the algorithm as the starting point. They collect and clean up the data, feed it to the model, then look at the results. If the performance lags or the analysis doesn’t provide the insight they hoped for, they spend a lot of time fine-tuning the model further or might use a different algorithm altogether. In other words, the better the algorithm, the better the results, or at least that’s been the prevailing school of thought.
But as more and better models come onto the market, there’s a growing awareness among the data science community of model-driven AI’s limitations. “Traditionally, data scientists tend to tinker with the model to make it robust against errors or ‘noise’ in the data,” explained Marcel Raas, a data scientist at GoDataDriven, “but this model-centric approach is giving way to a data-centric approach, one where we actively work with the data instead.”
Raas was part of a team at Schiphol Airport that worked on the optimization of the turnaround process. To Raas, this project exemplifies such a data-centric solution.
Schiphol Airport Turnaround Process Optimization
As soon as an aircraft arrives at an airport, it is quickly prepared for departure. Because this requires a sequence of events in a short time, the plane’s “turnaround process” is tightly choreographed: passengers disembark, workers unload luggage, mechanics perform maintenance and refuel it, while airline staff clean and disinfect it. The smallest disruption in the turnaround process can create a domino effect, leading to potential delays.
To optimize this process for Schiphol airport, GoDataDriven’s Marcel Raas was one of the team members at Schiphol to develop a deep-learning solution that translates real camera images from the aircraft stands (VOPs) into usable data. This data provides insight into the various sub-processes (such as refueling, pushback, cleaning, and catering) and helps Schiphol predict and prevent delays.
“Schiphol feeds huge amounts of data into the model, real-time data from the cameras to make current predictions and historical data to reveal new opportunities for optimization. Three annotators ensure that all of the data is constantly consistent. ” – Marcel Raas, data scientist at GoDataDriven
By detecting human errors in the annotation process, determining which data is most instructive for the model, and only annotating that data, they can guarantee that uncertainties are filtered out. A feedback loop also ensures that the quality of the algorithm continuously improves.
Monitoring also plays a major role in improving data sets and predictions. Sudden “uncertainties” may indicate an underlying data problem that merits further analysis. With the Schiphol project, for example, a sudden deviation in the forecasts could indicate fog or a defective camera. Monitoring the results continuously in real-time ensures that any problems with the data are quickly identified.
Monitoring can also contribute to improvements in products and services in other industries, as it provides insight into customer motivation. For example, an AI model could predict churn by analyzing customers who cancel their subscriptions to a particular service or product. If the model assigns a specific customer a value between “0” (certain failure) and “1” (no failure), what causes that uncertainty? Did the customer interact with the organization after taking out the subscription? If not, why not? Could it keep this customer by making them a well-timed new offer?
No Garbage In, No Garbage Out: Data Points the Way
“Data quality has always been an important factor in data science, but in the past five to ten years, with increasingly sophisticated and accessible tooling, it has become increasingly easier to build a predictive model, so the emphasis on data quality has become more pronounced. More value is added when you maximize the quality of the underlying data, so improving data quality should be the number one priority for any organization. No garbage in, no garbage out.” – Rens Dimmendaal, data scientist at GoDataDriven
“Training a machine learning model and putting it into production has become very simple,” said Raas, “and with so many accessible ML tools on the market, like Azure Machine Learning and MLflow, that provide error analysis and monitoring, data scientists are free to focus on optimizing, understanding, and conforming a company’s data. So, in the end, the surest way to the best model is by feeding it top-quality data. In other words, data points the way.”
Why Organizations Fail to Turn Data Assets Into Business Value
Discover and realize the value of your data assets with the analytics engineer and the modern data stack
Get in Touch!
Contact Marcel Raas, if you want to know more about the subject of Data Centric AI. He’ll be happy to help you!