Data Processing at Scale
Data is knowledge and knowledge is power. But processing data efficiently can be challenging as it scales up. This training goes deep down into one of the most popular and scalable tools in the market for large-data transformation: Apache Spark!
What you'll learn
- How Apache Spark works and advanced features of the tool
- How to write efficient ETL jobs
- Basic and advanced use of the API to transform data
- How to think in terms of distributed systems when writing Spark jobs
The program consists of both theory and hands-on exercises.
- Inner-workings of Apache Spark
- Loading data from various formats
- Basic and advanced dataframe operations
- Window and user-defined functions
- Unit testing
- Hands-on exercise to analyze large-scale logs to find trending topics
Climbing a steep Python and Machine Learning curve in three days. This would have taken me months on my own.
This online course is perfect for
Data and Machine Learning Engineers who deal with transformation of large volumes of data. Basic experience with Python is required. If you’re not quite there yet, we recommend the Python for Data Engineers course as preparation for this training.
What will you learn during Data Processing at Scale?
After this training, you will have learned how Apache Spark works and have acquired essential skills necessary to write efficient ETL Spark jobs to process large sets of data.
The Learning Journey for Data Engineers
Learn how to take data and AI concepts from concept to prototype and to production-ready application. Acquire the skills to develop and run Data and AI solutions at an enterprise-scale with ease! Take part in a specific training or advance through the entire journey. Learn how to build secure data platforms and reliable AI applications that are engineered for scale.
The Right Format For Your Preferred Learning Style
At GoDataDriven we offer four distinct training modalities:
- In-Classroom & In-Company Training
- Online, Instructor-Led Training
- Hybrid and Blended Learning
- Self-Paced Training