Intel Xeon Phi

Back to course contents

Intel Xeon Phi
Phi Based Computing
dirac.physics.drexel.edu
programming for the Phi
learning how to compute FAST on the Phi

Intel Xeon Phi

The Intel Xeon Phi is a a specialized massively parallel processor that attaches to a host CPU to offload calculations for speed. The Phi processor cards are highly efficient at performing large calculations in parallel.

Intel description

Phi based computing

The Intel Xeon Phi is a coprocesor on a board that attaches to a host CPU. The board harbors multiple multiple cores (60) which can support 4 threads each for a total of 240 threads. The Phi is essentially a massively parallel computer on a card with access to a large shared memory.

The computing environment, known as the Intel Manycore Platform Software Stack (MPSS), is based on Linux. The Phi itself presents a mostly full linux interface to the users. For instance, a user (you) will be log-in on the host CPU to which the Phi is attached, the host. You can then access a Phi coprocessor ( called mic0, mic1, ... ) via a standard ssh interface. You can copy executables via scp, for instance scp abc login_name@mc0: where abc is an executable. The Phi presents a directory to the user, where files are handled in the same fashion as on any other linux based computer

A convenient way to use the Phi coprocessor is this:

Compile a program on the host CPU using the Intel C compiler, specifying that the executable is for the Phi architecture, the Many Integrated Core (MIC) architecture, via a switch -mmic.
Copy the executable to the Phi via scp
Log on to the Phi via ssh
Execute the code on the Phi via ./name_of_executable

Software according to Intel

Back to top of page

dirac.physics.drexel.edu

Our Intel Xeon Phi computer encompasses 2 Xeon cpus on the mother board, 128 GB of memory, two (2) Phi cards model 5110pi with 8 GB of shared memory on each card. It is named Dirac.physics.drexel.edu. Once log in via ssh on Dirac, the user can ssh in either of the Phi cards, labeled mic0 and mic1 respectively.

Description of the system

Model 5110p

   Back to top of page

Programming for the Phi

The Intel Xeon Phi presents to the user the very well known Linux environment. The user ssh in the host CPU front-end and then ssh to any of the Phis. A C or a Fortran code should be compiled on the front-end host CPU via the Intel compiler ( icc for a C code ) with a switch to request a cross compiling for the Phi. For instance, to compile and execute a code called foo.c, do
icc -mmic foo.c -o foo

Then upload the executable on the Phi via
scp foo mic0:
ssh to the Phi
ssh -X user_name@mic0

and execute the code via ./foo

This was easy! However, it may not give the exepted very large gain in speed for more complicated codes.
The announced large speed gains in executing applications on the Phi derive from parallelization ( 60 cores, 240 threads on the Phi ), vectorization, unraveled loops, ... This in turn is implemented via compiler directives, the use of openMP, commmunicating with the compiler via #pragma directives, the use of global environmental variables and seldom rewriting portions of the codes.

   Back to top of page

Learning how to compute FAST on the Phi
The large gain in speed by using the Phi are mostly accomplished via the use of openMP (Open Multi-Processing). OpenMP is an API that supports multi-platform shared memory multiprocessing programming in C, C++, and Fortran.

The wikipedia site is often the best page to read first - no exception here: Wikipedia

An interesting tutorial on openMP by Blaise Barney, Lawrence Livermore National Laboratory, openMP describes the approach well.

Intel maintains numerous references and descriptions on its web sites. In particular, the book Intel Xeon Phi - Co-processor High Performance Programming by Jim Jeffers and James Reinders, both at Intel, teache the subject by examples. This book can be previewed on the web at specific cases. This file takes a data file name as a command line arg.

This book contains non-trivial examples that illustrate the use of openMP and compiler switches to get fast executions of codes on the Phi. I bundled these and some very illustrative simple codes in a tar archive.

See: phi_sample_codes.tar.

See: Chapter_4.tar.

You may also want to look at the following tutorials and references:

Five misconceptions about the phi: Five missconceptions

Tutorial on openMP: Tutorial

A long list of syntax and purpose of openMP commands: Syntax and purpose

A Hands-on approach to openMP: Hands-on openMP tutorial
openMP by examples: Examples
A specialized openMP tutorial for the PHI card: Special openMP

Reaching a teraflops on a PHI card: teraflops

Corrections

John and Joe>

worked to correct the libraries that use the proper software.

o "export LD_LIBRARY_PATH=\"$(MICDIR):${LD_LIBRARY_PATH}\"" > setup.sh

Makefile

Chapter_2.tar (corrected)

icc -openmp -mmic -vec -report=3 -O3 hellomem.c -o hellomem
~
scp hellomem vallieres@mic0:

ssh -X vallieres@mic0

Needs modification for specific cases.

Takes a data file as a command line arg.

John (Jack) T. O'Brien Physics Undergraduate Drexel University 908-256-4031 john.t.obrien@drexel.edu

plotData.py
import sys import numpy as np from matplotlib.pyplot import subplots, show

with open(sys.argv[1],'rb') as f: subplots(1,1)[1].imshow(np.array(map(np.uint8, f.read().split())).reshape(1024,1024, 4)) show()

   Back to top of page

"   Back to course contents