Data Science with Spark

Two-Day Training

Learn to combine SQL, Streaming, and Complex Analytics at Scale!

Apache Spark is a powerful open-source processing engine built around speed, ease of use, and advanced analytics. Through our experienced consultants, you can learn to unlock its full potential and master this challenging tool yourself.

“I liked every aspect of this training and would like to thank the trainers. They did an excellent job of explaining how to use Spark for data science. This is the fourth GoDataDriven training I’ve followed. All were great, but this was the best one so far.” —Data Scientist, Knab

Register Now Through the Xebia Academy Website

You will be redirected to the Xebia Academy Website for registration

Register Now

This training is perfect for

Anyone working in an organization that uses Apache Spark and wants to get the most out of it. The training is not limited to Data Scientists who wish to scale their projects. Data Engineers, Data Analysts, Software Programmers, and Database Administrators who want to exploit Apache Spark will also benefit from this course. Prior experience with Python or software programming is required. Experience with database languages such as SQL and pandas is helpful, but not required.

What will you learn during this training?

Gain the theoretical knowledge, hands-on experience, and best practices you need to get the most out of Apache Spark. After completing the training, you will be able to use Apache Spark for data science at scale confidently.


The program consists of both theory and hands-on exercises.

Day 1:

  • Spark basics
  • Advanced Spark
  • DataFrames

Day 2:

  • Window functions

Day 3:

  • Spark structured streaming
  • Integrating Apache Spark with Apache Kafka

You will be redirected to the Xebia Academy Website for registration

You will learn:

  • The difference between transformations and actions
  • How Spark optimizes code through laziness and lineage
  • About caching and persistence levels
  • All about Spark DataFrames and how they operate with pandas
  • The functions API and how to join data
  • Window operations and user-defined functions
  • The API
  • Preprocessing data and feature engineering
  • The various components of Spark Structured Streaming

Download Training brochure

Download the GoDataDriven brochure for a complete overview of available training sessions and data engineering, data science, and analytics translator learning journeys.

Download brochure

Training Formats

This training is available in the following formats:

In-Company Classroom

In-Company training is perfect for groups of 6 or more. The training takes place online, at your office, or at one of our modern training facilities.

Online Virtual Classroom

Virtual Classrooms provide you with an interactive environment to effectively develop your skills, right from the comfort of your own home or office.

Data Science Learning Journey

This data science learning journey is available for any data professional. Our extensive training programs are designed to develop your skills from junior to senior.

Our curriculum teaches you new skills and empowers you to stay ahead professionally. We offer solid fundamentals that apply to practical Python courses, whether you are a beginner or an advanced user. We also offer courses on Spark, R, and Deep Learning.

We’ve experienced first hand what works and what doesn’t through our consulting business, and we pass that knowledge on to you through our education business. You learn all the ins and outs of the data science models most seen in the field, in a fast-paced classroom training that ups your game.

Click here for more information about the Learning Journey for Data Scientists

Data Science Learning Journey

Our latest insights

Data and AI Training Insights

See all
More information

Any questions? Please get in touch!

Contact Gert-Jan Steltenpool, our Sales Director, if you want to know more. He’ll be happy to help you!