Training schedule

Join waiting list

IN-COMPANY TRAINING PROGRAMS

Contact Giovanni Lanzani, if you want to know more about custom data & AI training for your teams. He’ll be happy to help you!
Check out more

Data Processing at Scale

Data is knowledge and knowledge is power. But processing data efficiently can be challenging as it scales up. This training goes deep down into one of the most popular and scalable tools in the market for large-data transformation: Apache Spark!

Clients we've helped

  • DSM is a GoDataDriven customer
  • Dupont - GoDataDriven customer
  • Logo-Booking.com
  • lego-logo
  • Airbus-logo
  • Merck-logo
  • Ahold Delhaize logo
  • Credit-Suisse-Logo
  • Shell-Logo
  • ING Bank
  • Danone logo
  • Nike-logo
  • tomtom_logo
  • Verizon-logo

What you'll learn

  • How Apache Spark works and advanced features of the tool
  • How to write efficient ETL jobs
  • Basic and advanced use of the API to transform data
  • How to think in terms of distributed systems when writing Spark jobs

The schedule

Contents:

The program consists of both theory and hands-on exercises.

  • Inner-workings of Apache Spark
  • Loading data from various formats
  • Basic and advanced dataframe operations
  • Window and user-defined functions
  • Unit testing
  • Hands-on exercise to analyze large-scale logs to find trending topics

learning journey

Data Engineering Learning Journey

This online course is perfect for

Data and Machine Learning Engineers who deal with transformation of large volumes of data. Basic experience with Python is required. If you’re not quite there yet, we recommend the Python for Data Engineers course as preparation for this training.

What will you learn during Data Processing at Scale?

After this training, you will have learned how Apache Spark works and have acquired essential skills necessary to write efficient ETL Spark  jobs to process large sets of data.

meet your trainer

Andrew Snare

Big data hacker

Andrew is a Big Data Hacker at GoDataDriven. He is an experienced software engineer with a deep understanding of numerous technologies and languages.

Andrew is a certified Cloudera, Databricks, and Cassandra instructor, and also enjoys sharing his experiences on stage, for example at Goto Conference.

Flexible delivery

The Right Format For Your Preferred Learning Style

In-Classroom & In-Company Training
Online, Instructor-Led Training
Hybrid and Blended Learning
Self-Paced Training
Get in touch with the experts

Have any questions?

Contact Giovanni Lanzani, our Managing Director of Learning and Development, if you want to know more. He’ll be happy to help you!

Call me back

You can reach him out by phone as well at +31 6 51 20 6163

Course: Data Processing at Scale Training

Book now