[Numpy-discussion] Hardware for Monte Carlo simulation

A.J. Rossini rossini at blindglobe.net
Tue Nov 27 09:44:02 CST 2001

>>>>> "HJL" == Hung Jung Lu <hungjunglu at yahoo.com> writes:

    HJL> Again, I have a tangential question. I am hitting the
    HJL> physical limit of the CPU (meaning things have been optimized
    HJL> down to assembly level), in order to achieve even higher
    HJL> performance, the only way to go is hardware.

    HJL> Is there any recommendation for fast machines at the price
    HJL> range of a few thousand dollars? (I cannot afford
    HJL> supercomputers or connection machines.) My purpose is to run
    HJL> Monte Carlo simulation. This means that a lot of scenarios
    HJL> can be run in parallel fashion. Of course I can just use
    HJL> regular cheap Pentium boxes... but they are kind of bulky,
    HJL> and I don't need any of the video, audio, USB features (I
    HJL> think 10 machines at 1GHz each would be the size of
    HJL> calculation power I need, or equivalently, a single machine
    HJL> at an equivalent 10GHz. Heck, if there are some specialized
    HJL> racks/boxes, I can wire the motherboards myself.) I am
    HJL> wondering what you people do for heavy number crunching? Are
    HJL> there any cheap yet specialized machines? What about machines
    HJL> with dual processor? I would imagine a lot of people in the
    HJL> number crunching world run into my situation, and since the
    HJL> number crunching machines don't require much beyond a
    HJL> motherboard and a small hard-drive, maybe there are already
    HJL> some cheap solutions out there.

The usual way is to build some "blackboxes", i.e. mobo/cpu/memory/NIC,
diskless or nearly diskless (you don't want to maintain machines :-).
Connect them using 100bT or faster networks (though 100bT should be

Do such things exist?  Sort of -- they tend to be more expensive than
building them yourself, but if you've got a reliable local supplier,
they can build them fairly cheaply for you.  I'd go with single or
dual athlons, myself :-).  If power and maintenance is an issue,
duals, and if not, maybe singles.

We use MOSIX (www.mosix.org) for transparent load balancing between
linux machines, and it could be used on the machines I described
(using a floppy or CD to boot).  

The next question is whether some form of parallel RNG will help.  The
answer is "maybe".  I worked with a student who evaluated coupled
chains, and we couldn't do too much better.  

And then after that, is whether you want to figure out how to
post-process the results.  If you want to automate the whole thing
(and it isn't clear that it would be worth it, but...), you could use
PyPVM to front-end the sub-processes distributed on the network,
load-balanced at the system level by MOSIX.

Now for the problems -- MOSIX seems to have difficulties with Python.
Severe difficulties.  I don't know if it still holds true for recent
MOSIX releases.

(note that I use R (www.r-project.org) for most of my simulation work
these days, but am looking at Python for stat analyses, of which MCMC
tools are of interest).


A.J. Rossini				Rsrch. Asst. Prof. of Biostatistics
U. of Washington Biostatistics		rossini at u.washington.edu	
FHCRC/SCHARP/HIV Vaccine Trials Net	rossini at scharp.org
-------------- http://software.biostat.washington.edu/ --------------
FHCRC: M-W: 206-667-7025 (fax=4812)|Voicemail is pretty sketchy/use Email
UW:   T-Th: 206-543-1044 (fax=3286)|Change last 4 digits of phone to FAX
Rosen: (Mullins' Lab) Fridays, and I'm unreachable except by email.

More information about the Numpy-discussion mailing list