The Intel Xeon Phi is a a specialized massively parallel processor that attaches to a host CPU to offload calculations for speed. The Phi processor cards are highly efficient at performing large calculations in parallel.
The Intel Xeon Phi is a coprocesor on a board that attaches to a host CPU. The board harbors multiple multiple cores (60) which can support 4 threads each for a total of 240 threads. The Phi is essentially a massively parallel computer on a card with access to a large shared memory.
The computing environment, known as the Intel Manycore Platform Software Stack (MPSS), is based on Linux. The Phi itself presents a mostly full linux interface to the users. For instance, a user (you) will be log-in on the host CPU to which the Phi is attached, the host. You can then access a Phi coprocessor ( called mic0, mic1, ... ) via a standard ssh interface. You can copy executables via scp, for instance scp abc login_name@mc0: where abc is an executable. The Phi presents a directory to the user, where files are handled in the same fashion as on any other linux based computer
A convenient way to use the Phi coprocessor is this:
Our Intel Xeon Phi computer encompasses 2 Xeon cpus on the mother board, 128 GB of memory, two (2) Phi cards model 5110pi with 8 GB of shared memory on each card. It is named Dirac.physics.drexel.edu. Once log in via ssh3m> on Dirac, the user can ssh in either of the Phi cards, labeled mic0 and mic1 respectively.
Description of the system
The Intel Xeon Phi presents to the user the very well known Linux environment. The user ssh in the host CPU front-end and then ssh to any of the Phis. A C or a Fortran code should be compiled on the front-end host CPU via the Intel compiler ( icc for a C code ) with a switch to request a cross compiling for the Phi. For instance, to compile and execute a code called foo.c, do
icc -mmic foo.c -o foo
Then upload the executable on the Phi via
scp foo mic0: ssh to the Phissh -X user_name@mic0
and execute the code via ./foo
This was easy! However, it may not give the exepted very large gain in speed for more complicated codes.
The announced large speed gains in executing applications on the Phi derive from parallelization ( 60 cores, 240 threads on the Phi ), vectorization, unraveled loops, ... This in turn is implemented via compiler directives, the use of openMP, commmunicating with the compiler via #pragma directives, the use of global environmental variables and seldom rewriting portions of the codes.
The wikipedia site is often the best page to read first - no exception here: Wikipedia
An interesting tutorial on openMP by Blaise Barney, Lawrence Livermore National Laboratory, openMP describes the approach well.
Intel maintains numerous references and descriptions on its web sites. In particular, the book Intel Xeon Phi - Co-processor High Performance Programming by Jim Jeffers and James Reinders, both at Intel, teache the subject by examples. This book can be previewed on the web at specific cases. This file takes a data file name as a command line arg.
This book contains non-trivial examples that illustrate the use of openMP and compiler switches to get fast executions of codes on the Phi. I bundled these and some very illustrative simple codes in a tar archive.
See: phi_sample_codes.tar.
See: Chapter_4.tar.
You may also want to look at the following tutorials and references:
Five misconceptions about the phi: Five missconceptions
Tutorial on openMP: Tutorial
A long list of syntax and purpose of openMP commands: Syntax and purpose
A Hands-on approach to openMP: Hands-on openMP tutorial
openMP by examples: Examples
A specialized openMP tutorial for the PHI card: Special openMP
Reaching a teraflops on a PHI card: teraflops
Corrections
John and Joe>
worked to correct the libraries that use the proper software.
o "export LD_LIBRARY_PATH=\"$(MICDIR):${LD_LIBRARY_PATH}\"" > setup.sh
Makefile
icc -openmp -mmic -vec -report=3 -O3 hellomem.c -o hellomem
~scp hellomem vallieres@mic0:
ssh -X vallieres@mic0
Needs modification for specific cases.
Takes a data file as a command line arg.
John (Jack) T. O'Brien Physics Undergraduate Drexel University 908-256-4031 john.t.obrien@drexel.edu
plotData.py
import sys import numpy as np from matplotlib.pyplot import subplots, showwith open(sys.argv[1],'rb') as f: subplots(1,1)[1].imshow(np.array(map(np.uint8, f.read().split())).reshape(1024,1024, 4)) show()