basin_list: Re: the design goals of basin remote

From: Doug Jones <dfj23@drexel.edu>
Date: Thu May 10 2007 - 13:11:23 EDT

I took some time to read through the pMatlab paper, so now I think I can
respond to some of the questions.

I think that it is true that pMatlab and BASIN-remote serve similar
design goals. I suppose the end result is to present a more productive
and user friendly interface to parallel computing.

As for the development method mentioned in the paper, it should indeed
be possible to work in a similar way with basin-remote. Nothing in the
kernel code prevents it from being executed in serial, outside of an MPI
environment. Basin-remote does not impose any restrictions beyond what
is inherent in the basin-kernel, such that the user could indeed do an
"import basin" in a local python shell without needing to create a
parallel environment first. From here, the user could develop code that
would execute in serial, locally, making it easier to debug.

At the moment, one of the key differences I see between the work flow of
a system like pMatlab and basin-remote is that with pMatlab, the user is
assumed to be accustomed to using the functions, data structures, and
programming styles associated with Matlab. The parallel code is an
extension of this that allows, through overloading, the user to continue
to use these existing operations and data structures, but in a parallel
manner. However, with basin a user of basin-remote will most likely not
be using functions and data structures that already exist in the Python
world, but instead they will need to adapt to basin specific functions
and data structures. However, this is probably a necessary result of
the problem space, as there doesn't seem to be an obvious library in
Python that should be overloaded to allow creation of parallel structures.

Which brings me to my next point: the interpreter's role in
basin-remote. With the code we have now, I'd view the interpreter as a
means to an end. That is, it provides an interface on the client
computer and a communications framework that allows the user to connect
to a running system and work interactively. It should also allow for
local interactive development in a serial manner.

However, as you mentioned Dr. Char, I think that users will want to do
the majority of their work in this environment opposed to having to drop
into C++. Enrico and I have been looking at various ways to make basin
work more closely with the Python environment. Some ideas we have been
looking at include converting a basin Attribute into a Python list as
well as providing a native Python iterator interface. I think providing
these features would go a long way toward allowing a user to more with
existing Python code and tools if they find this necessary.

At least, all of this is how I see things potentially playing out. I
think that provided the choice of working in C++ or Python, most users
would probably only choose C++ when something needs to be done faster
than could be accomplished in Python, or if they need exact control over
an operation. With Basin, it may be that they get the best of both
worlds: computation intensive functions or operations could be coded in
C++ and then exposed to Python like the rest of the basin kernel,
allowing the higher level development to continue in Python while the
heavy lifting is done in machine code from the compiled C++ functions.

~Doug

Bruce Char wrote:
> I'm hanging out at Argonne this month so I won't be able to make the
> basin meeting. However, I've been checking out basin-related things
> while I've been here.
>
> 1. I read about pMatlab:
>
> @techreport{
> bliss-2006,
> author={Bliss,Nadya and Kepner,Jeremy},
> year={2006},
> title={pMatlab Parallel Matlab Library},
> url={http://www.citebase.org/abstract?id=oai:arXiv.org:astro-ph/0606464},
>
> I wasn't aware that Jeremy Kepner started out as an astrophysics student
> at Princeton before he
>
> got involved in high performance matlab at Lincoln Labs. It sounds like
> his original motivations for the work were similar to that of basin,
> though.
>
> At any rate the pmatlab system is interesting to me because it's another
> example of trying to make HPC programmers more productive by giving them
> an interpreter which has access to parallel computation on the back
> end. Like basin, parallel global arrays are a key way to get the
> distributed computation handled. However,they seem to take a logical
> step in making it possible to set up the global arrays through the
> programming done in the interpreter, through the concept of "maps" which
> describe the data distribution, and then having the various
> interpreter-level functions be able to take a map as an additional
> parameter to mathematical functions. Another feature that I liked was
> that they made it possible to switch back and forth fairly easily
> between distributed/parallel and sequential/non-distributed execution,
> making it easier to do debugging.
>
> They tout the fact that they believe that their user base was able to
> set up parallel versions of their codes very quickly once they had the
> sequential versions running in Matlab. Performance compared to C+MPI
> was not necessarily so wonderful.
> Is this the kind of purpose-case that we want to make for basin-remote?
>
> 2. I also had time to look at the "high performance python" paper I
> mentioned a few days ago:
>
> @article{
> PLW2007,
> author={Luszczek,Piotr and Dongarra,Jack},
> year={2007},
> month={Summer},
> title={*High Performance Development for High End Computing with Python
> Language Wrapper (PLW)*},
> journal={The International Journal of High Performance Computing
> Applications},
> volume={21},
> number={2},
> http://www.cs.utk.edu/~luszczek/pubs/plw-200605.pdf}
>
> The paper, like Basin, touts python as a rapid way of developing HPC
> code, but tries to push the high performance aspect a bit further. It
> says "if you are satisfied with performance in Python (with calls to
> C/C++), you can leave things as you wish, but if not you should consider
> compiling the Python program into C". The advantages are a) better
> performance, b) portability to HPC platforms where Python does not run.
> For example, they say that BlueGene/L does not support dynamic
> libraries, while Cray XT3 has a lightweight OS kernel missing features
> that Python assumes exist in the OS. So the compilation to C completely
> removes Python from the picture.
>
> In order to get compilation to happen,they annotate Python methods with
> type declarations of
> e.g. real and float. The annotations are just Python comments, and they
> are used in a fashion similar to # pragmas in the C/C++ world.
>
>
> While compilation into C and user-level global arrays are two different
> things, I think they do come from an underlying concern -- that if using
> the interpreter is such a productivity aid, that
> users want to deal with it more and do separate C++ programming less.
> What is the basin
> philosophy about the role of the interpreter in "doing science with basin"?
>
>
>
Received on Thu May 10 13:06:45 2007

This archive was generated by hypermail 2.1.8 : Fri Aug 08 2008 - 19:25:03 EDT