Data Engineering Learning Journey

Data Processing at Scale Training

English | 2-day
Book now Download brochure
Xebia Academy

Data Processing at Scale

Data is knowledge and knowledge is power. But processing data efficiently can be challenging as it scales up. This training goes deep down into one of the most popular and scalable tools in the market for large-data transformation: Apache Spark!

What you'll learn

  • How Apache Spark works and advanced features of the tool
  • How to write efficient ETL jobs
  • Basic and advanced use of the API to transform data
  • How to think in terms of distributed systems when writing Spark jobs

The Program

The program consists of both theory and hands-on exercises.


  • Inner-workings of Apache Spark
  • Loading data from various formats
  • Basic and advanced dataframe operations
  • Window and user-defined functions
  • Unit testing
  • Hands-on exercise to analyze large-scale logs to find trending topics

Climbing a steep Python and Machine Learning curve in three days. This would have taken me months on my own.

FD Mediagroep Data Scientist

This online course is perfect for

Data and Machine Learning Engineers who deal with transformation of large volumes of data. Basic experience with Python is required. If you’re not quite there yet, we recommend the Python for Data Engineers course as preparation for this training.

What will you learn during Data Processing at Scale?

After this training, you will have learned how Apache Spark works and have acquired essential skills necessary to write efficient ETL Spark  jobs to process large sets of data.

Data Engineering

The Learning Journey for Data Engineers

Learn how to take data and AI concepts from concept to prototype and to production-ready application. Acquire the skills to develop and run Data and AI solutions at an enterprise-scale with ease! Take part in a specific training or advance through the entire journey. Learn how to build secure data platforms and reliable AI applications that are engineered for scale.

The Right Format For Your Preferred Learning Style

At GoDataDriven we offer four distinct training modalities:

  • In-Classroom & In-Company Training
  • Online, Instructor-Led Training
  • Hybrid and Blended Learning
  • Self-Paced Training

Learn more about our training modalities

Clients we've helped

  • ING Bank
  • Ahold Delhaize
  • Quby