Customer challenge: Become a data-driven organization that develops applications that help customers save energy, while respecting their privacy.
Provided solution: Support in-house data science team to operationalize a Hadoop cluster. Improve team skills through Data Science Accelerator Program.
Outcome: Adopted new way of working. Launched first version of energy platform, that was built upon further by in-house team.
GoDataDriven gave us the extra rocket boost we needed to get to the next level quickly – Stephen Galsworthy, Head of Data Science
Quby, the creator of smart thermostat and in-home data service platform Toon, has the ambition to contribute to reducing the amount of energy that hundreds of thousands of households unknowingly waste. They believe that smart data-driven solutions are the key to reach this goal. That’s why Quby chose to start working together with Amsterdam-based data science consultancy firm GoDataDriven on the introduction of smart data-driven solutions, using Databricks on Amazon Web Services.
The installation of Toon has been shown to lead to 10% savings on household energy bills, while the recently introduced Waste checker has already shown its potential to drastically increase those savings even further.
Stephen Galsworthy, head of data science at Quby, explained, “GoDataDriven’s team is very experienced in developing scalable, robust, and tested data products that can be taken into production easily. By working closely together and building on this experience we were able to achieve our goals significantly faster.”
Becoming Data-Driven: The First Steps
When Quby decided to become a data-driven organization, they realized that the first step was to ensure data quality and to make data available for a small group of consenting users. Quby changed the way that data was stored: from locally inside the smart thermostat itself to centrally in a Hadoop cluster.
Privacy
Customer privacy is very important to Quby. Users of the Waste checker service give permission to collect their data and already in a very early stage, Quby made sure to be GDPR-compliant.
Quby’s data science team collects the data of individual users without knowing their identity. The Toon device locally generates a key for the collected data to identify the data as coming from a unique source. The end users maintain control over this key at all times since it is stored in their Toon. Quby can only use identifiable data when a user consents to it by opting in for a specific service.
Centralizing Data in Hadoop
The initial data science team, consisting of data scientists employed by Quby, complemented by data engineers and data scientists from GoDataDriven, operationalized this Hadoop cluster by installing data science tools, like Spark, Jupyter hub and Python notebooks. Furthermore, GoDataDriven set-up the data science workflow that enabled all team members to work together on the same re-usable code.
For Stephen Galsworthy, working with GoDataDriven to accelerate the development of data-driven solutions was a logical move: “GoDataDriven gave us the extra rocket boost we needed to get to the next level quickly. Their integrated approach of knowledge sharing provided real added value as it allowed the organization to absorb lots of information”.
Data Science Accelerator
Besides collaborating with Quby’s team on a daily basis, GoDataDriven’s ten-day Data Science Accelerator Program was also an important element in establishing a true data science culture within Quby.
Support from the Organization
As the success of the smart applications largely depends on the organizational support, the data science team focused on developing working models and proof of concepts and progress was presented to the rest of the organization regularly. “From the start, many of our colleagues were enthusiastic. As the products advanced, we noticed that more and more people from the Quby organization became convinced of the potential of smart applications to reach the organizational goals”, said Galsworthy.
Developing the Waste Checker
One of the core products of Quby is the Waste checker. This service provides an insight in gas and electricity consumption on a household level, per appliance or type of behavior.
Developing White Good Use Cases
To start recognizing the specific consumption profile of appliances, Quby developed electricity disaggregation algorithms that turn energy meter signals into patterns of individual appliances.
Recognizing the Unique Profile of Specific Appliances
“By disaggregating signals, it becomes possible to distinguish a kettle from a washing machine. We use this insight to determine the energy efficiency of an appliance by comparing its consumption with other appliances in the same category”, explains Galsworthy.
The first minimum viable products focused on four white good products: tumble dryer, dishwasher, washing machine, and fridge. Since early 2018, ten use cases have been taken in production
Expanding Applications
Besides electricity disaggregation algorithms, Quby also started developing models for gas disaggregation. These models not only process data from the boiler to recognize individual appliances that are used for heating, cooking, and heating of water, but can also be used to determine how well a house is insulated or even the impact on energy consumption of certain consumer habits.
From Proof of Concept to Production
In the spring of 2017, when Quby tested their latest Python app with a group of five hundred users, they set the goal to have a production-ready version available by autumn. To speed up this process, Quby decided to convert the algorithms from Python and PySpark to run using Spark in Scala. Although Quby kept using Python for prototyping and plotting, Scala had become the go-to language to productionize applications. Tim van Cann, data engineer at GoDataDriven, helped Quby’s R&D team build up their Scala skills. “Quby has highly motivated data scientists and engineers who are all eager to learn and not afraid to step outside their comfort zone. This attitude has created a very productive and open environment where people work on valuable services as a team.”
To minimize the engineering efforts, Quby switched to Databricks on Amazon Web Services to manage their data.
“The things we liked most about Databricks on AWS were the Notebook functionality, the availability of Hive, scheduling features, and the ability to deliver data science services, all from one environment, without the need for additional full-time data engineers”, explains Galsworthy. Also, Quby and GoDataDriven started creating libraries and stand-alone functions to make the process of algorithm development more efficient.
Scaling up
Quby moved through several minimum viable products with a handful of users and a simple web app that used green ticks or red exclamation marks as energy efficiency indicators. The AWS cloud platform proved its scalability and abilty to handle peak loads. While features got added, the number of users expanded from 500 to over 300,000 as well.
Although scalable infrastructure was in place and the team had matured, going from an MVP for 50 users to an app with data-intensive features for 300,000 users is a challenging task.
“You should not underestimate the steps to go from thousands to hundreds of thousands of users. We learned a lot during the journey and in every stage, we needed to decide on the right technology and process. All of this requires a totally different mindset. We are very pleased that we have been able to work with GoDataDriven to guide us along the way”, said Galsworthy.
“GoDataDriven’s team is very experienced in developing scalable, robust, and tested data products in the cloud that can be taken into production easily”