Posted on — 22 December 2021

Soda and GoDataDriven Partnership

Soda and GoDataDriven Partner to Build Optimal Data Quality Workflows for Organizations


Soda and GoDataDriven Partnership

Soda, the provider of data reliability tools and observability platform, has today announced a partnership with Xebia’s data & AI consultancy GoDataDriven to solve critical data reliability and quality issues faced by organizations. The partnership includes the co-development of a new Open Source Software (OSS) Spark library that supports organizations with a big data friendly solution for maintaining data quality, and an agreement to build optimal data quality workflows for joint customers.

Apache Spark is one of the most popular open source projects on the planet, with more than 1,000 contributors from over 250 organizations. Spark’s popularity is driven by its ability to process large datasets at speed, APIs that enable flexibility and simple migration to distributed frameworks, and versatility in connecting to virtually any data source. But with so many organizations relying on this data to build and maintain data products, data teams have so far lacked the transparency needed to ensure data quality. Working alongside GoDataDriven, whose clients include Ahold,, Unilever, Mollie, and ING, Soda has released Soda Spark, the latest OSS release to provide a common solution to a common problem for data engineers.

Soda Spark is part of a growing suite of OSS data reliability tools for engineers working in data-intensive environments where data quality is paramount. Using the Soda Spark library, data engineers can log errors from failed tests using their preferred logging system, or through Soda Cloud, and avoid writing corrupt data to their data lakes. And because Spark DataFrames are data source agnostic, Soda Spark scans can run against a variety of data sources, including Amazon Athena, Amazon Redshift, Google Big Query, and Snowflake.

"Transparency creates a trust in data that becomes the catalyst for an organization to be truly data enabled and make confident, data-driven decisions,” explains Niels Zeilemaker, CTO, GoDataDriven. “In Soda, we recognised a shared belief in improving end-to-end data issue workflows so that data teams have the power to prioritize and resolve issues based on a holistic view of what is happening across an entire organization. Soda’s low technical barriers and vibrant developer community provide the ideal platform to meet the needs of our customers, starting with the co-development of a Spark integration which further extends the data quality workflow."

With so much data being produced on a daily basis, Spark’s ability to unify disparate data processing capabilities, allowing developers to use a single framework to accommodate all their processing needs, has seen it become a critical part of the modern data stack. From the creation of on-demand video streaming tailored precisely to viewers preferences, to banks crunching huge volumes of machine learning data to support fraud prevention, Spark is fundamentally changing the business world.

“Achieving true end-to-end observability across the data product lifecycle requires everyone in a data team – from data engineers through to analysts – to be able to understand data, rely on it, and keep on top of it,” said Maarten Masschelein, CEO & Co-Founder, Soda. “Through our partnership with GoDataDriven, a consultancy that has built a stellar reputation working in lock-step with some of the most high profile brands in the Netherlands and Germany, Soda is providing the end-to-end data quality workflows and data engineering practices to enable the trust and confidence in data that organizations need to become truly data informed.”

Soda Spark is provided as open source software under the Apache License and offered for free on Github. The product executes either on the cloud or on the local systems of data engineers. For more information and to learn how to get started, please visit Github.

About Soda

Soda is the data reliability company that provides Open Source (OSS) and SaaS tools that enable data teams to discover, prioritize, and resolve data issues. Soda’s mission is to bring everyone closer to the data, resulting in data products and analytics that everyone can trust. Soda is one of the 2021 Gartner® Cool Vendors™ in Data Management [1], recognition and validation for our approach to solving the number one data management challenge faced by modern organizations: ensuring high quality, trusted data is available to enable confident decision making, serve and delight customers, and improve processes. For more information, visit

About GoDataDriven

GoDataDriven helps the world’s Top 250 companies and category leaders embrace data innovation, adopt the latest data and AI technologies, and establish enterprise-wide data & aI training programs. GoDataDriven is part of Xebia (, a global IT pioneer providing high-quality consulting services that cover all aspects of digital transformation. From software development to cloud, data, AI, software consulting, DevOps, and Agile. Its clients include, among others, Disney, Ahold Delhaize, Tesco, Philips, and ING bank. Xebia employs 3,100 people who work from strategically located offices in Europe, APAC, UAE, the UK, and the US and has a revenue of 200+ million Euros in the last four quarters.

[1] Gartner “Cool Vendors in Data Management,” Philip Russom, Ehtisham Zaidi, Jason Medd, Eric Thoo, Robert Thanaraj, 1 June 2021 – ID G00746797

Explore more news