UvA course 2016-06-15
Extras - Wave simulation with MPI

UvA logo

SURFsara logo

This is an exercise from the Extras part of the Tutorial UvA course 2016-06-15.

In this advanced part of our HPC Cloud tutorial we ask you to play around with a parallel processing technique on a message-passing system. For this puspose, we will be running wave simulations using MPI. We will approximate solutions for the wave differential equation in 2D, by using numerical methods.

NOTE:

You are now in the advanced section of the workshop. You have your laptop and an Internet connection. We expect you will be able to find out more on your own about things that we hardly/don’t explain but which you think you need. For example, if we were you, at this point we would’ve already googled for several things:

  1. Numerical methods
  2. Wave differential equation
  3. MPI cheatsheet

We provide you with an implementation of that simulation using MPI. You will be asked to perform multiple runs of each program, so that fluctuations caused by e.g. network can be middled out.

Tip:

We recommend you have a look at the end of this page for some hints on how to help measuring time, running a program multiple times and computing an average time out of multiple measurements.

a) Setting up a VM for the exercise

We will be creating a 2-core VM for this exercise.

ssh -X ubuntu@145.100...`
sudo add-apt-repository "deb http://archive.ubuntu.com/ubuntu $(lsb_release -sc) main universe"
sudo apt-get update
sudo apt-get install build-essential

Optionally verify gcc and GNU make installation and version with gcc -v and make -v respectively.

sudo apt-get install libhdf5-serial-dev libopenmpi-dev openmpi-bin openmpi-common hdf5-tools ImageMagick gnuplot

b) Preparing the program

wget https://github.com/sara-nl/clouddocs/raw/gh-pages/UvA-course-20160615/code/waveeq.tar.gz
tar -zxf waveeq.tar.gz
cd waveeq/
ls -l
make wave4

c) Serial runs

The code in file wave4.c is where the main routine is. It uses functionality from other .c and .h files, but the main loop is there. The main loop understands how many MPI processes must be created and it divides the space among them to distribute work.

When the program runs, it writes output to file wave4.h5 in HDF5 format.

We invite you to explore the code to get familiar with it. If you change any of these values, please remember to compile the program again for the changes to make effect.

You can run the program in a single process with the following command:

time ./wave4

You can use the provided program h5anim to read from the output file of the wave4 program (remember, called wave4.h5) and create an animated wave4-anim.gif. You do it so:

./h5anim wave4.h5 u

Then you can use any browser to view the animated gif (e.g. with the command firefox wave4-anim.gif), or install the program gifview (with sudo apt-get install gifsicle) and see it so: gifview --animate wave4-anim.gif.

Food for brain c1:

  • Can you make a batch of several runs and observe the performance?

d) Prepare for multi-VM

We will want to show how to scale out later, and that will involve multiple VMs, as explained during the presentation. In order for multiple VMs to be able to run out of the same image, this must be non-persistent (as explained in Part B).

ssh-keygen -t rsa -f /home/ubuntu/.ssh/id_rsa  
#Enter passphrase (empty for no passphrase): <<<<leave this empty (hit enter twice)>>>>
cat /home/ubuntu/.ssh/id_rsa.pub >> /home/ubuntu/.ssh/authorized_keys

Previously we made the image persistent so that when shutting the VM down, changes would be saved and kept for the next run. But changes are only saved when you actually shut the VM down gracefully.

IMPORTANT

Now that the image is non-persistent, no changes will be saved when you shut down a VM using it. If you require so at some point, you will have to make it persistent first!

e) Multicore version

Because the program is ready for MPI, you can use mpirun to use multiple cores.

Exercse e1: Try now the mpi run with 2 cores, like this:

You can now run the program with 2 processes like:

time mpirun -n 2 ./wave4

Food for brain e1:

  • Is the output image from this multi-process the same as the single-core one?
  • How many processes are running? (hint: use the top command on a different terminal)
  • Do you see any significant time improvement as compared to running it with one process? Can you explain the improvement (or lack thereof)?

Exercise e2: You can try to run now the program with more processes. For example, with 4:

time mpirun -n 4 ./wave4

Food for brain e2:

  • Is the output image from this multi-process the same as that from previous runs?
  • Can you make a batch of several runs (e.g.: 20) and calculate the average runtime and standard deviation?
  • How many processes are running? (hint: use the top command on a different terminal)
  • Do you see any significant time improvement as compared to the previous runs? Can you explain the improvement (or lack thereof)?

Exercise e3: Try now running the program with a couple more configurations, like 6 processors or 8. Any improvement in time?

f) Scaling out to multiple VMs

MPI is able to communicate within processes that may physically by running on different (virtual) machines. We are going to make this happen now.

There is a fair amount of configuration that needs to happen among all the machines involved in cooperating for running MPI jobs. We have prepared a couple of scripts that you can use for that.

Master-workers concept

A typical way of considering a cluster is to have a master node (or host) where you can externally log into and launch the programs, along with a set of worker nodes that the master knows about and where it delegates computing workload.

In our exercises, we will consider one master and one worker node. Both will compute (so the master will not be just a passive node, but it will also contribute to the output). We will not consider any job-submitting queues, but rather, we will let MPI communicate over SSH. For that, both the master and the worker need to be able to SSH to each-other without requiring a password (a.k.a. passwordless ssh). We provide you with a script that you can run in each of the machines (in this case just one master an one worker) to do this interactively in a, hopefully, easy way.

To make it easier, all nodes where MPI will run a program must have that program installed the same way in the same path. Because we have been carefully building the image so far, that is already done.

Also, usually the worker nodes are protected (inaccessible) from the outside world, so you can only reach them normally from within the internal network. We will simulate this as well. We have provided a script to configure the master and another for the worker node, which will change the hostname and also shutdown the external network interface on the worker node.

Launch a 2-core worker VM

Exercise f1: Launch another VM that will become a worker

Configure master and worker VMs

Exercise f2: Configure the master node

cd
wget https://github.com/sara-nl/clouddocs/raw/gh-pages/UvA-course-20160615/code/makeme_master.sh
chmod +x makeme_master.sh
sudo ./makeme_master.sh
exit

Food for brain f2:

  • What has just happened?

Exercise f3: Configure the worker node

Let’s start by giving root a password on the worker node, so that we can use the VNC console.

sudo su -
passwd
exit

IMPORTANT!

  • Do not go any further until you are sure that you can log in as root in the worker node via the VNC console. In the Desktop version select “Other” and enter “root” as username and the password you have just set. Ignore the error in initial screen (hit OK).
cd
wget https://github.com/sara-nl/clouddocs/raw/gh-pages/UvA-course-20160615/code/makeme_worker.sh
chmod +x makeme_worker.sh
sudo ./makeme_worker.sh 1 XXX.YYY.ZZZ.TTT  #replace XXX.YYY.ZZZ.TTT with the INTERNAL IP address of the master
#hit ENTER when prompted and wait a bit..

Food for brain f3:

  • Why do we recommend you to use the VNC console on this VM?
  • What has just happened!? Why do you need to become root? Why does the script require those parameters?

Configuring the firewall

The appliances that we deliver on the AppMarket come with a firewall running on the operating system, called UFW. MPI needs to communicate through the network between master and worker. They are both running a firewall. To avoid problems and because this is just a test scenario, we will trust all traffic coming from our internal interfaces.

Launching a run over multiple VMs

Exercise f4: Run over 2 VMs

time mpirun -np 4 -H <master_INTERNAL_ip>,<worker_INTERNAL_ip> /home/ubuntu/waveeq/wave4

Food for brain f4:

  • Is the output image from this multi-process the same as that from previous runs?
  • How many processes are running? (hint: use the top command on different terminals)
  • Do you see any significant time improvement as compared to the previous runs? Can you explain the improvement (or lack thereof)?

BONUS food for brain

This section is meant as extra questions that we thought would be nice for you to investigate, and we invite you to do/think about them even after the workshop is finished.

Bonus 1: Can you make a batch of several runs (e.g.: 20) and calculate the average runtime and standard deviation?

Bonus 2: What do you need to do to make more workers available? Is our image enough? Go ahead: try to have 2 more workers of 1 core each. Then run the program among them. Does the run time reduce? And if you have 2 workers, 2 cores each? And 3 workers, 2 cores each? And… Is it worth parallelising a lot? Where is the optimum?

Bonus 3: It can become a problem when you have to copy and install your program and data “everywhere” in the same place. This can be alleviated by sharing your /home folder via NFS. Can you set that up?

Bonus 4: Having to download, compile yourself the source code of the tool you need, and install it, is a very common workflow. Do you have a tool in this situation? Can it benefit from MPI? Please, let us know. Can you successfully get it running? Can you parallelise it?

Bonus 5: Using SSH might be a way to go along, but when you have multiple things to run at a time, ensuring users’ access, passwordless permissions… There exist cluster-building tools based on job queues, like Sun (now Oracle) Grid Engine, Torque, etc. Can you find out more? Can you set it up?

Bonus 6: MPI is an implementation of a technique for parallelising computations. Another common technique is shared memory. One implementation for that technique is OpenMP. You can read more about it at their website: http://openmp.org/wp/.



NOTE: Do not forget to shutdown your VMs when you are done with your performance tests.

If you want more of the advanced exercises on the HPC Cloud, see Extras.

Time measurement hints

In order to help measuring times, we can thought of the hints in this section.

TIMEFORMAT=$'elapsed %R' bash -c 'for i in {1..20}; do echo [iter $i]; time mpirun -n 2 ./wave4; done 2>&1 | tee times_n2.txt'

cat times_n2.txt | grep ^elapsed | sed -e s/.*\ // | awk '{sum += $1; sumsq+=$1*$1 } END {print "   avg " (sum/NR) "\nstddev " sqrt(sumsq/NR - (sum/NR) * (sum/NR))}'

An explanation on the previous code listing: