Implementing a Data Lake in the Cloud

Foto: Vattenfall/Jorrit Lousberg

Customer challenge: Structuring and unlocking the available data to improve customer insights and experience.

Provided solution: Implementing a Data Lake in the cloud that combines all available data into one source.

Outcome: Platform was used for churn prediction, signalling customer service agents which customers to contact. This application led to reduced churn rates.

After an initial Data Discovery Workshop, Vattenfall continued the cooperation with GoDataDriven to implement a Data Lake based on Azure and to kickstart their Data Science practice.

Educating the Organization

For the Vattenfall organization, data and associated terms like cloud, R, Hadoop-clusters, Python, and Data Lakes, were new. To explain these terms and their benefits to the organization, Vattenfall and GoDataDriven produced an animated video that is now being narrowcasted within the offices. Alongside an insight in the many opportunities to improve customer service, this video also explains the new infrastructure and the benefits of open source technnology.

A Data Lake to Store All Data Centrally

Data stored in databases and spreadsheets combined with some CRM activities used to be common practice in many organizations. With the introduction of Big Data technology it has become possible to save all information, categorise it, analyse it, and use it in real time. Data no longer comes from just neatly structured tables, but comes from many (unstructured) sources too, including phonecalls, chats, pictures, videos, reports, bills, email, surfing behaviour, social media accounts. For Vattenfall; everything is information.

“Big Data is about different types of data, structured data and non-structured data. Very important are the speed in which you receive it, process it, and how quickly you can make it available for other purposes”, Alexander Bij, Big Data Hacker at GoDataDriven, explains.

Data Lake Based on Hadoop Technology

In the Data Lake, Hadoop technology is used to process data. Hadoop is like a cluster of computers working together. Saving data and the processing is divided across multiple machines. Together, they can handle a lot more data and computations. The Data Lake continues to grow, and is everywhere.

You might picture an actual lake, but the data is stored in the Azure cloud, where storage is cheaper. The data lake is almost entirely managed with open source software. Together with GoDataDriven, the Vattenfall analysts write code in R and Python to analyse data, and convert it into useful information.

Developing Customer Focused Data Solutions

“Big Data within Vattenfall means that we have a lot of data about our customers. We know who our customers are, we know which energy services they use, we know when they are most likely to contact us and with the smart meter data, we know a lot about our customers’consumption patterns. By applying all that knowledge in a smart way, we can improve the customer experience”, says Rixt Altenburg, Manager Customer Insights at Vattenfall.

“Yes, we have a lot of information about our customers. Now, when we detect an unexpected increase in consumption, we can inform our customers proactively, and help them avoid unusually high invoices”, Rens Weijers, Manager Data & Strategy at Vattenfall, adds.

Alexander Bij, Big Data Hacker at GoDataDriven adds: “There is a lot more. We can help customers through use of good analyses and machine learning to figure out what questions customers have and help them before they call the Vattenfall customer service.”

Weijers concludes: “When a customer moves, and we pick up on that, we can provide the individual customer with relevant information and advice in advance. Now that is what I call relevant and personal….!”

Becoming More Personal and Relevant

Data is used not only for Vattenfall’s own benefit, but especially for their customers. “Purchasing energy is very important to us. Especially aligning it with actual customer demand. We are located across from the Amsterdam ArenA, a beautiful stadium of which we know when their home team Ajax plays their matches, or when concerts are scheduled. When we really begin integrating this data with our Data Lake, we can purchase energy more effectively. This will lead to better deals for our customers”, Weijers says.

Altenburg adds: “When we notice that customers contact us frequently after visiting certain webpages, we can draw the conclusion that these pages are not clear enough, and we should improve them. And that is what it’s all about.

With the Data Lake and the DataDriven application, Vattenfall becomes the most personal and relevant organization for their customers. Customers enjoy the services without much difficulties. Two million customers who are enthusiastic about the services, make Vattenfall the most recommended energy supplier of the Netherlands.”


Public | Utilities

Project type

Azure Data Lake
Custom Predictive Modelling

Technologies used


“By developing solutions based on the data in the Data Lake, we can inform our customers proactively and help them to optimize their energy consumption.”

Rens Weijers Manager Data and Strategy
Explore more cases