Data Processing at Scale
Data is knowledge and knowledge is power. But processing data efficiently can be challenging as it scales up. This training goes deep down into one of the most popular and scalable tools in the market for large-data transformation: Apache Spark!
Download Training brochure
Download the GoDataDriven brochure for a complete overview of available training sessions and data engineering, data science, and analytics translator learning journeys.
This online course is perfect for
Data and Machine Learning Engineers who deal with transformation of large volumes of data. Basic experience with Python is required. If you’re not quite there yet, we recommend the Python for Data Engineers course as preparation for this training.
What will you learn during Data Processing at Scale?
After this training, you will have learned how Apache Spark works and have acquired essential skills necessary to write efficient ETL Spark jobs to process large sets of data.
The program consists of both theory and hands-on exercises.
- Inner-workings of Apache Spark
- Loading data from various formats
- Basic and advanced dataframe operations
- Window and user-defined functions
- Unit testing
- Hands-on exercise to analyze large-scale logs to find trending topics
- How Apache Spark works and advanced features of the tool
- How to write efficient ETL jobs
- Basic and advanced use of the API to transform data
- How to think in terms of distributed systems when writing Spark jobs
This training is available in the following formats:
In-Company training is perfect for groups of 6 or more. The training takes place online, at your office, or at one of our modern training facilities.
Online Virtual Classroom
Virtual Classrooms provide you with an interactive environment to effectively develop your skills, right from the comfort of your own home or office.
Data Science Engineering Journey
This data engineering learning journey is available for any data experts. Our extensive training programs are designed to develop your skills from junior to senior.
How do you become a data engineering expert? Start here! We’ve put together a carefully crafted learning journey for data engineers. Knowing engineers love to figure things out on their own, we packed the program with opportunities to learn, hands-on, by solving real-life situations. Plus, there’s plenty of practical philosophy, too.
We’ll teach you how to leverage Docker to ease your deployments and navigate code written by data scientists ( Advanced Python and Data Science in Production). You will learn to use Apache Airflow, Apache Spark, and Kafka like a forklift to move data around.