This graduate course will teach students to design and analyze programs for big data. It will provide knowledge on big data architectures, programming languages, and ecosystems with a focus on Spark. The techniques presented in the course are expected to have a high impact in a variety of fields such as data analysis, customer recommendation, trend prediction, pattern recognition, etc.
Credits: 7.5 hp
Examiner: Pierre Nugues, firstname.lastname@example.org
Contact Pierre if you would like to take the course.
Course instructors: Peter Exner, Marcus Klang, Dennis Medved, Håkan Jonsson.
The course consists of four full-day sessions that will address:
- Cloud architectures, Spark concepts, and Spark programming.
- Intermediate and advanced Spark
- Supervised machine learning with Spark: MLlib and MLlib programming
- Unsupervised machine learning
Each session will be divided into lectures and hands-on programming exercises. Participants will have to carry out weekly programming projects and write short dissertations analyzing landmark papers in the field.
Expected prior knowledge
Good programming skills in Java, Scala, or Python. Knowledge of statistics
- Holden Karau, Andy Konwinski, Patrick Wendell, Matei Zaharia, Learning Spark, O'Reilly Media, 2015, 978-1-449-35862-4
- Sandy Ryza, Uri Laserson, Sean Owen, Josh Wills, Advanced Analytics with Spark, O'Reilly, 2015, 978-1-491-91276-8
- Lecture 1: Monday, September 7, 9 to 12, 13 to 16, Room E:2116. Instructors: Peter Exner (main), Marcus Klang (assistant)
- Lecture 2: Monday, September 14, 9 to 12, 13 to 16, Room E:2116. Instructors: Marcus Klang (main), Peter Exner (assistant)
- Lecture 3: Monday, September 28, 9 to 12, 13 to 16, Room E:2116. Instructors: Dennis Medved (main), Marcus Klang (assistant)
- Lecture 4: Monday, October 5, 9 to 12, 13 to 16, Room E:2116. Instructors: Håkan Jonsson (main), Dennis Medved (assistant)