Implementing a Data Lake in the Cloud

Cloudera Data Lake on Azure // Custom Predictive Modelling




Rob Dielemans

Our role

Cloudera Data Lake on Azure // Custom Predictive Modelling

Foto: Nuon/Jorrit Lousberg

After an initial Data Discovery Workshop, Nuon continued the cooperation with GoDataDriven to implement a Data Lake based on Cloudera on Azure and to kickstart their Data Science practice. For the implementation of the Data Lake, GoDataDriven worked together with two other organizations from the Xebia Group: XITA and Xpirit.

Educating the Organization

For the Nuon organization, Big Data and associated terms like Cloud, R, Hadoop-clusters, Python, and Data Lake, were new. To explain these terms and their benefits to the organization, Nuon and GoDataDriven produced an animated video that is now being narrowcasted within the Nuon offices. Alongside an insight in the many opportunities to improve customer service, this video also explains the new infrastructure and the benefits of open source technnology.

A Data Lake to Store All Data Centrally

Data stored in databases and spreadsheets combined with some CRM activities used to be common practice in many organizations. With the introduction of Big Data technology it has become possible to save all information, categorise it, analyse it, and use it in real time. Data no longer comes from just neatly structured tables, but comes from many (unstructured) sources too, including phonecalls, chats, pictures, videos, reports, bills, email, surfing behaviour, social media accounts. For Nuon; everything is information.

"Big Data is about different types of data, structured data and non-structured data. Very important are the speed in which you receive it, process it, and how quickly you can make it available for other purposes”, Alexander Bij, Big Data Hacker at GoDataDriven, explains.

Data Lake Based on Hadoop Technology

In the Data Lake, Hadoop technology is used to process data. Hadoop is like a cluster of computers working together. Saving data and the processing is divided across multiple machines. Together, they can handle a lot more data and computations. The Data Lake continues to grow, and is everywhere.

You might picture an actual lake, but the data is stored in the Azure cloud, where storage is cheaper. The data lake is almost entirely managed with open source software. Together with GoDataDriven, the Nuon analysts write code in R and Python to analyse data, and convert it into useful information.

Developing Customer Focused Data Solutions

“Big Data within Nuon means that we have a lot of data about our customers. We know who our customers are, we know which energy services they use, we know when they are most likely to contact us and with the smart meter data, we know a lot about our customers'consumption patterns. By applying all that knowledge in a smart way, we can improve the customer experience”, says Rixt Altenburg, Manager Customer Insights at Nuon.

"Yes, we have a lot of information about our customers. Now, when we detect an unexpected increase in consumption, we can inform our customers proactively, and help them avoid unusually high invoices", Rens Weijers, Data & Performance Management at Nuon adds.

Alexander Bij, Big Data Hacker at GoDataDriven adds: "There is a lot more. We can help customers through use of good analyses and machine learning to figure out what questions customers have and help them before they call the Nuon customer service."

Weijers concludes: "When a customer moves, and we pick up on that, we can provide the individual customer with relevant information and advice in advance. Now that is what I call relevant and personal....!"

Becoming More Personal and Relevant

Data is used not only for Nuon’s own benefit, but especially for their customers. "Purchasing energy is very important to us. Especially aligning it with actual customer demand. We are located across from the Amsterdam ArenA, a beautiful stadium of which we know when their home team Ajax plays their matches, or when concerts are scheduled. When we really begin integrating this data with our Data Lake, we can purchase energy more effectively. This will lead to better deals for our customers", Weijers says.

Altenburg adds: "When we notice that customers contact us frequently after visiting certain webpages, we can draw the conclusion that these pages are not clear enough, and we should improve them. And that is what it's all about. With the Data Lake and the DataDriven application, Nuon becomes the most personal and relevant organization for their customers. Customers enjoy the services without much difficulties. Two million customers who are enthusiastic about the services, make Nuon the most recommended energy supplier of the Netherlands."

Technology we used

Data is very important for Nuon. We started working together with GoDataDriven to combine all data in the data lake and to start to develop DataDriven solutions.

Rixt Altenburg
Manager Customer Insights