Data Processing at Scale

Two-Days Trainings

Data Processing at Scale

Data is knowledge and knowledge is power. But processing data efficiently can be challenging as it scales up. This training goes deep down into one of the most popular and scalable tools in the market for large-data transformation: Apache Spark!

Download Training brochure

Download the GoDataDriven brochure for a complete overview of available training sessions and data engineering, data science, and analytics translator learning journeys.

Download Brochure

This online course is perfect for

Data and Machine Learning Engineers who deal with transformation of large volumes of data. Basic experience with Python is required. If you’re not quite there yet, we recommend the Python for Data Engineers course as preparation for this training.

What will you learn during Data Processing at Scale?

After this training, you will have learned how Apache Spark works and have acquired essential skills necessary to write efficient ETL Spark  jobs to process large sets of data.

The Program

The program consists of both theory and hands-on exercises.


  • Inner-workings of Apache Spark
  • Loading data from various formats
  • Basic and advanced dataframe operations
  • Window and user-defined functions
  • Unit testing
  • Hands-on exercise to analyze large-scale logs to find trending topics
  • How Apache Spark works and advanced features of the tool
  • How to write efficient ETL jobs
  • Basic and advanced use of the API to transform data
  • How to think in terms of distributed systems when writing Spark jobs

Training Formats

This training is available in the following formats:

In-Company Classroom

In-Company training is perfect for groups of 6 or more. The training takes place online, at your office, or at one of our modern training facilities.

Online Virtual Classroom

Virtual Classrooms provide you with an interactive environment to effectively develop your skills, right from the comfort of your own home or office.

Data Science Engineering Journey

This data engineering learning journey is available for any data experts. Our extensive training programs are designed to develop your skills from junior to senior.

How do you become a data engineering expert? Start here! We’ve put together a carefully crafted learning journey for data engineers. Knowing engineers love to figure things out on their own, we packed the program with opportunities to learn, hands-on, by solving real-life situations. Plus, there’s plenty of practical philosophy, too.

We’ll teach you how to leverage Docker to ease your deployments and navigate code written by data scientists ( Advanced Python and Data Science in Production). You will learn to use Apache Airflow, Apache Spark, and Kafka like a forklift to move data around.

Click here for more information about the Learning Journey for Data Engineers

GoDataDriven - Data Engineer Learning Journey

More information

Any questions? Please get in touch!

Contact Gert-Jan Steltenpool, our Sales Director, if you want to know more. He’ll be happy to help you!