[SciPy-dev] New maximum entropy and Monte Carlo packages
Jonathan Taylor
jonathan.taylor at utoronto.ca
Wed Jan 18 06:29:32 CST 2006
Maybe there should be some place for "extra" packages. That is,
packages that only a subset of people will want. This is like in R
where by default you get some useful packages, and it is fairly easy
to add on "extra" packages.
Jon.
On 1/18/06, Ed Schofield <schofield at ftw.at> wrote:
> Hi all,
>
> I recently moved two new packages, maxent and montecarlo, from the
> sandbox into the main SciPy tree. I've now moved them back to the
> sandbox pending further discussion. I'll introduce them here and ask
> for feedback on whether they should be included in the main tree.
>
> The maxent package is for fitting maximum entropy models subject to
> expectation constraints. Maximum entropy models represent the 'least
> biased' models subject to given constraints. When the constraints are
> on the expectations of functionals -- the usual formulation -- maximum
> entropy models take the form of a generalized exponential family. A
> normal distribution, for example, is a maximum entropy distribution
> subject to mean and variance constraints.
>
> The maxent package contains one main module and one module with utility
> functions. Both are entirely in Python. (I have now removed the F2Py
> dependency.) The main module supports fitting models on either small or
> large sample spaces, where 'large' means continuous or otherwise too
> large to iterate over. Maxent models on 'small' sample spaces are
> common in natural language processing; models on 'large' sample spaces
> are useful for channel modelling in mobile communications, spectrum and
> chirp analysis, and (I believe) fluid turbulence. Some simple examples
> are in the examples/ directory. The simplest use is to define a list of
> functions f, an array of desired expectations K, and a sample space, and
> use the commands
>
> >>> model = maxent.model(f, samplespace)
> >>> model.fit(K)
>
> You can then retrieve the fitted parameters directly or analyze the
> model in other ways.
>
> I've been developing the maxent algorithms and code for about 4 years.
> The code is very well commented and should be straightforward to maintain.
>
>
> The montecarlo package currently does only one thing. It generates
> discrete variates from a given distribution. It does this FAST. On my
> P4 it generates over 10^7 variates per second, even for a sample space
> with 10^6 elements. The algorithm is the compact 5-table lookup sampler
> of Marsaglia. The main module, called 'intsampler', is written in C.
> There is also a simple Python wrapper class around this called
> 'dictsampler' that provides a nicer interface, allowing sampling from a
> distribution with arbitrary hashable objects
> (e.g. strings) as labels instead of {0,1,2,...}. dictsampler has
> slightly more overhead than intsampler, but is also very fast (around
> 10^6 per second for me with a sample space of 10^6 elements labelled
> with strings). An example of using it to sample from this discrete
> distribution:
>
> x 'a' 'b' 'c'
> p(x) 10/180 150/180 20/180
>
> is:
>
> >>> table = {'a':10, 'b':150, 'c':20}
> >>> sampler = dictsampler(table)
> >>> sampler.sample(10**4)
> array([b, b, a, ..., b, b, c], dtype=object)
>
> The montecarlo package is very small (and not nearly as impressive as
> Christopher Fonnesbeck's PyMC package), but the functionality that is
> there would be an efficient foundation for many discrete Monte Carlo
> algorithms.
>
> I'm aware of the build issue Travis Brady reported with MinGW not
> defining lrand48(). I can't remember why I used this, but I'll adapt it
> to use lrand() instead and report back.
>
>
> Would these packages be useful? Are there any objections to including them?
>
>
> -- Ed
>
>
>
>
> _______________________________________________
> Scipy-dev mailing list
> Scipy-dev at scipy.net
> http://www.scipy.net/mailman/listinfo/scipy-dev
>
More information about the Scipy-dev
mailing list