Hoppa till huvudinnehåll

Setting up Spark

Setting up the Environments

Overview

In this course, we will use the Spark as computing environment (http://spark.apache.org/). Spark comes with four programming languages: Scala, Java, R, and Python. We have chosen Python as the language of the course together with Jupyter notebooks, because it makes programming easier.

To use Spark, you will have to install it on your machine before you start the course. We will also demonstrate the 150-core Crafoord cluster to run some of the computing experiments.

  1. We have created a Linux virtual machine so that all the participants share the same environment. This machine contains all the software we will use: Spark, Python, and Jupyter, and a small application to download the data sets. You have the setup instructions on this page: http://semantica.cs.lth.se/pyspark/. Please, be sure you have installed it before you start the course. Should you experience problems, please report them to us.
  2. We will provide you with accounts to log on to the cluster when you start the course.
Sidansvarig: pierre.nugues@cs.lth.se | 2015-11-06