basin_list: Re: IPython information

From: Douglas Jones <dfj23@drexel.edu>
Date: Mon Dec 18 2006 - 16:18:38 EST

I have installed IPython and all of the tools necessary for parallel
computation on Frinkiac.

You should be able to follow the tutorial found here:
http://ipython.scipy.org/moin/Parallel_Computing/Tutorial

except for the section involving the numpy module as that is not
installed on Frinkiac.

Thanks,
~doug

Douglas Jones wrote:
> Hello BASIN group,
>
> Here are a list of links for more information about IPython.
>
> The IPython shell.
> http://ipython.scipy.org/moin/About
>
> The Parallel Computation teams main page.
> http://ipython.scipy.org/moin/IPython1
>
> An good introduction to the features of their system.
> http://modular.math.washington.edu/sage/talks/2006-02-23-granger-parallel_ipython/Granger-parallel_IPython.pdf
>
>
> Another presentation about their parallel system
> http://ipython.scipy.org/moin/About/Presentations?action=AttachFile&do=get&target=ipython_cu06.pdf
>
>
> More presentations can be found here
> http://ipython.scipy.org/moin/About/Presentations
>
> IPython is part of a larger group of Scientific Tools for Python
> called SciPy
> http://www.scipy.org/
>
>
> In addition, I'm attaching the reply from Brian Granger in regard to
> my questions about IPython and parallel computation. Brian is one of
> the main developers for the IPython parallel tools.
>
>
> I'd be happy to answer any questions about IPython or how IPython and
> BASIN will interact.
>
> Thank you,
> ~Doug
>
> ------------------------------------------------------------------------
>
>
> ---------- Forwarded message ----------
> From: Brian Granger <ellisonbg.net@gmail.com>
> Date: Dec 14, 2006 12:51 PM
> Subject: Re: [IPython-dev] IPython1 parallel questions
> To: Douglas Jones <dfj225@gmail.com>
> Cc: ipython-dev@scipy.org, ipython-user@scipy.org
>
>
> Douglas,
>
> See my comments inline below.
>
>
>> I recently discovered IPython1 and its parallel capabilities. Once I
>> read about the feature set, I became extremely excited as IPython1
>> seems to solve many of the interactive parallel computing problems
>> that I had hoped to solve for a project that I am a developer on.
>>
>> Let me start by framing my project. Our code currently exists as
>> library written in C++. Our library does parallel computation using
>> MPI. We currently have code that exposes this to python in an
>> interactive manner, and have been looking for ways to expand this to a
>> fuller more robust implementation. I'm hoping that IPython1 will be
>> the perfect piece to create our collaborative, interactive
>> environment.
>>
>
> Nice, this is one of the main usage cases we had in mind when
> designing ipython1. It should work well for this.
>
>
>> That said, as I investigate IPython further and start to develop some
>> prototype code, I have some questions for the community and
>> developers.
>>
>> 1) Are there any outstanding issues or problems with the IPython1 code
>> base that inhibits parallel computation? I've been experimenting with
>> the code and its features for the past few days in addition to looking
>> through the bug list, and from what I can tell there are none. Are
>> there any features mentioned in IPython's documentation or the
>> presentations on the site that haven't been implemented yet?
>>
>
> Not really. IPython1 is already being used for real science and
> people don't seem to be limited in any significant way. With that
> said, there are a number of things we are still working on. Here is a
> taste:
>
> 1. Optimizing how large objects are send between the client and
> engines. In our current approach, the controller become a bottleneck
> when you try to use push/pull to send really big things (> 100's of
> MBs). With that said, if you are wanting to send large objects
> around, you might want to rethink how you are parallelizing you
> algorithm.
>
> 2. Task farming. While the architecture is setup for task farming,
> we have not implemented a few parts of it. We are actively (as in
> this week) working on this.
>
> 3. Security.
>
> 4. Scalability. Currently all the engines connect to a single
> controller. We have tested this on 128 processors and it works fine.
> The problem is that most systems have a per process file-descriptor
> limit that is not much higher than that (like 256). We would like to
> explore ways (multiple controllers?) of getting it to scale beyond
> 128-256.
>
>
>> 2) Is there any authentication or security between the client shell
>> and the IPython controller?
>>
>
> Not yet, but we are working on it.
>
>
>> 3) What are the proper ways to configure the operation of the client
>> and the controller or the engines? From what I've seen it looks like
>> there is an API that allows the user to set certain options. Can
>> configuration files be used?
>>
>
> Yes, we have a very powerful configuration system. For examples see
> the ipython1/configfiles directory. You can simply copy these over to
> your ~/.ipython directory and start playing around.
> The documentation on this part of things is still not great though.
> Please let us know if you have questions.
>
>
>> Some configuration issues I'm thinking of right now involve output.
>> For instance, is there a way to turn off the feedback from each node
>> for every line of code? I would imagine that if you had a cluster with
>> many nodes running, you would not want to see this feedback. Another
>> issue is how output on stdout from C++ code is handled. I noticed that
>> right now, any output on stdout for a simple C++ module that I have
>> exposed to python and loaded in each engine simply is simply written
>> by the engine to the console. Is there some way to redirect this
>> output?
>>
>
> There are two ways of executing code: blocking and non-blocking:
>
> rc.executeAll('a=10', block=True)
>
> This mode will wait until the command has been run and it will bring
> back the stdout/stderr and print it to the screen. If the remote
> command is long running, your local ipython session will remain
> blocked until the command is complete.
>
> rc.executeAll('time.sleep(1000000)', block=False)
>
> In non-blocking mode, execute returns immediately after _submitting_
> the command. Furthermore, it won't automatically bring back and print
> the stdout/stderr of the command.
>
> You can also set all command to block or not by using the block
> attribute of the RemoteController object:
>
> rc.block = False
>
> I usually use block=True for debugging, but then set block=False for
> long running things. Also, you can always get the stdout/stderr of a
> previously run command by using the %result magic:
>
> %result # print the stdout/stderr of the
> last remote commad
>
> %result 10 # print the stdout/stderr of the 10th
> remote command
>
>
>> 4) When objects are sent from the kernel to the client, is the only
>> prerequisite to this that the object being sent is able to be pickled?
>> I suppose the client would also have to have the class code for any
>> objects imported?
>>
>
> Yes. Numpy arrays are sent using their raw buffers so for that case
> you don't have to pay the price of pickling.
>
>
>> 5) Are there any known projects that use IPython1 for interactive
>> scientific computing? I'd be really interested to see ones that also
>> support visualization of distributed data.
>>
>
> Yes. Two examples:
>
> 1. At my company (Tech-X), there are a group of people using ipython1
> to do parallel data analysis on supercomputers. They start with
> 50-100 GBs of data in 1000-2000 hdf5 data files and need to do a bunch
> of calculations that involve data from multiple files. They use
> ipython1 to first reduce the data they want (in parallel) to a single
> file and then then run an algorithm (again in parallel) over a set of
> parameters. There were able to parallelize this code in 2 days and
> its shows nice linear scaling.
>
> 2. Fernando Perez (the creator of ipython) is using ipython1 in his
> research in applied mathematics. His algorithm uses multiresolution
> analysis to solve high dimensional partial differential equations. He
> has just begun (in the last 2 weeks) to parallelize his code using
> ipython1, so I don't think he is in production mode. His case is also
> non-trivial as it 1) needs automatic load balancing and 2) has lots of
> communications.
>
> The reason we started working on this, is that both Fernando and I are
> scientists (both got our Ph.D.'s in Theoretical Physics) and we wanted
> these tools to exist so we could use them for our own research.
>
> There are some other folks on the list that have playing around with
> ipython1, but I am not sure if anyone else has moved into production
> mode yet.
>
>
>> Well, I think that covers all my initial questions. Thank you for
>> bearing this long post and my newbie questions. I'm really looking
>> forward to working more with IPython as it seems like an amazing piece
>> of software.
>>
>
> Thanks! Please let us know if you have more questions/comments or ideas.
>
> Brian
>
>
>> Thank you,
>> ~doug
>> _______________________________________________
>> IPython-dev mailing list
>> IPython-dev@scipy.org
>> http://projects.scipy.org/mailman/listinfo/ipython-dev
>>
>>
Received on Mon Dec 18 16:18:24 2006

This archive was generated by hypermail 2.1.8 : Fri Aug 08 2008 - 19:25:03 EDT