Training schedule

15 Mar - 17 Mar, 2021
Online, instructor-led / English
€1795
2 Jun - 4 Jun, 2021
Online, instructor-led / English
€1795
13 Sep - 15 Sep, 2021
Online, instructor-led / English
€1795
1 Dec - 3 Dec, 2021
Online, instructor-led / English
€1795

IN-COMPANY TRAINING PROGRAMS

Contact Gert-Jan Steltenpool, if you want to know more about custom data & AI training for your teams. He’ll be happy to help you!
Check out more

Data Science - Senior

Data Science with Spark

 

Apache Spark is a powerful open-source processing engine built around speed, ease of use, and advanced analytics. Through our experienced consultants, you can learn to unlock its full potential and master this challenging tool yourself.

“I liked every aspect of this training and would like to thank the trainers. They did an excellent job of explaining how to use Spark for data science. This is the fourth GoDataDriven training I’ve followed. All were great, but this was the best one so far.” —Data Scientist, Knab

Clients we've helped

  • DSM is a GoDataDriven customer
  • Dupont - GoDataDriven customer
  • Logo-Booking.com
  • lego-logo
  • Airbus-logo
  • Merck-logo
  • Ahold Delhaize
  • Credit-Suisse-Logo
  • Shell-Logo
  • ING Bank
  • Danone logo
  • Nike-logo
  • tomtom_logo
  • Verizon-logo

What you'll learn

Spark basics

  • Spark execution
  • SparkSession
  • Transformations vs. actions
  • Laziness and lineage: how Spark optimizes code
  • How to use the Spark UI
  • Advanced Spark
  • How to apply partitioning and how Spark reads and writes data
  • Shuffling, narrow wide operations, and their impact on performance
  • The catalyst optimizer
  • About scheduling and job execution
  • About caching and persistence levels

DataFrames

  • The basic concepts
  • All about Spark DataFrames and pandas DataFrames
  • How to load and save DataFrames
  • The functions API
  • How to join data
  • User-defined functions and pandas’ user-defined functions (with performance implications)
  • Window operations

Spark.ml

  • Machine Learning with Spark
  • Pre-processing data and feature engineering
  • Model selection
  • Pipeline API
  • Advanced topics

Spark structured streaming

  • Structured streaming
  • Machine Learning & streaming
  • Sources and sink
  • Windows & aggregations
  • Checkpointing & watermarking
  • Fault tolerance & Kafka
  • Kafka as a source and as a sink

The schedule

Day 1
  • Spark basics
  • Advanced Spark
  • DataFrames
Day 2
  • Window functions
  • Spark.ml
Day 3
  • Spark structured streaming
  • Integrating Apache Spark with Apache Kafka

learning journey

Data Science Learning Journey

meet your trainer

Vadim Nelidov

Data Enchanter

Vadim is Data Scientist passionate about solving data-driven problems and sharing his analytical insights to make Data literacy a reality for all.

Flexible delivery

The Right Format For Your Preferred Learning Style

In-Classroom & In-Company Training
Online, Instructor-Led Training
Hybrid and Blended Learning
Self-Paced Training
Get in touch with the experts

Have any questions?

Contact Gert-Jan Steltenpool, the sales director of GoDataDriven Academy if you want to know more. He’ll be happy to help you!

You can reach him by phone as well at +31 6 4214 0783

Course: Data Science with Spark Training

Book now