Training scheduleJoin waiting list
Data Processing at Scale
Data is knowledge and knowledge is power. But processing data efficiently can be challenging as it scales up. This training goes deep down into one of the most popular and scalable tools in the market for large-data transformation: Apache Spark!
Clients we've helped
What you'll learn
- How Apache Spark works and advanced features of the tool
- How to write efficient ETL jobs
- Basic and advanced use of the API to transform data
- How to think in terms of distributed systems when writing Spark jobs
The program consists of both theory and hands-on exercises.
- Inner-workings of Apache Spark
- Loading data from various formats
- Basic and advanced dataframe operations
- Window and user-defined functions
- Unit testing
- Hands-on exercise to analyze large-scale logs to find trending topics
Data Engineering Learning Journey
This online course is perfect for
Data and Machine Learning Engineers who deal with transformation of large volumes of data. Basic experience with Python is required. If you’re not quite there yet, we recommend the Python for Data Engineers course as preparation for this training.
What will you learn during Data Processing at Scale?
After this training, you will have learned how Apache Spark works and have acquired essential skills necessary to write efficient ETL Spark jobs to process large sets of data.
Andrew SnareBig data hacker
Andrew is a Big Data Hacker at GoDataDriven. He is an experienced software engineer with a deep understanding of numerous technologies and languages.
Andrew is a certified Cloudera, Databricks, and Cassandra instructor, and also enjoys sharing his experiences on stage, for example at Goto Conference.
The Right Format For Your Preferred Learning Style
Structured, to-the-point, good combination of theory and practical examples, very knowledgeable trainer who can explain concepts very well
It was a hands-on and tangible course. We could apply what we learned in a matter of minutes. The trainer did a great job of answering ad-hoc questions that complemented the material. We appreciated the fact that we could apply what we were taught directly to our company.
I liked every aspect of this training and would like to thank the trainers. They did an excellent job of explaining how to use Spark for data science. This is the fourth GoDataDriven training I’ve followed. All were great, but this was the best one so far.
Climbing a steep Python and Machine Learning curve in three days. This would have taken me months on my own.