Stop Blaming Your Data

Ruben van de Geer/
21 December, 2020

Anyone who has ever worked as a data {scientist, engineer, analyst} knows that, at some point during the project, the data (quality) is going to ruin the party. You have your models and cross validation running on your state-of-the-art CI/CD pipeline, but find out that the predictions are essentially trash. After some digging, you find out that the data you have been given is noisy, incomplete, biased, and incorrect altogether.

Quite often, this is the moment when the blame game starts. Everyone complains and blames the database folks: they’ve failed miserably in storing high-quality data. You wonder how your PO could have ever come up with the idea to use data science on this data.

I strongly believe this is a bad trait that many data people — including yours sincerely — suffer from. It is a counter-productive and unprofessional attitude that drains the energy from you and the rest of your team. So, what should you do?

Stop blaming the data. Data is hardly ever perfect and you are often the first one to apply machine learning to it. Most datasets have been maintained for operational purposes. Not to train a deep learning model on. Also, end users and managers are hardly ever interested in what can't be done or scapegoats of any kind. And the most likely outcome if you continue the blame game? No models and no data science anymore. Nobody wins.

Be constructive and pragmatic. The business wants solutions, not problems. And a solution might not be a complex model requiring perfect data. Forget the initial approach and familiar techniques. Data science is, despite the name, an applied art that needs creativity. The science happens elsewhere.

Be the positive voice in your team. Even if the data sucks and you have been dealt a bad hand, be the person that inspires and encourages. Don't whine during every stand-up. Write a blog post instead.

Subscribe to our newsletter

Stay up to date on the latest insights and best-practices by registering for the GoDataDriven newsletter.