[SciPy-dev] New maximum entropy and Monte Carlo packages

Jonathan Taylor jonathan.taylor at utoronto.ca
Wed Jan 18 06:29:32 CST 2006


Maybe there should be some place for "extra" packages.  That is,
packages that only a subset of people will want.  This is like in R
where by default you get some useful packages, and it is fairly easy
to add on "extra" packages.

Jon.

On 1/18/06, Ed Schofield <schofield at ftw.at> wrote:
> Hi all,
>
> I recently moved two new packages, maxent and montecarlo, from the
> sandbox into the main SciPy tree.  I've now moved them back to the
> sandbox pending further discussion.  I'll introduce them here and ask
> for feedback on whether they should be included in the main tree.
>
> The maxent package is for fitting maximum entropy models subject to
> expectation constraints.  Maximum entropy models represent the 'least
> biased' models subject to given constraints.  When the constraints are
> on the expectations of functionals -- the usual formulation -- maximum
> entropy models take the form of a generalized exponential family.  A
> normal distribution, for example, is a maximum entropy distribution
> subject to mean and variance constraints.
>
> The maxent package contains one main module and one module with utility
> functions.  Both are entirely in Python.  (I have now removed the F2Py
> dependency.)  The main module supports fitting models on either small or
> large sample spaces, where 'large' means continuous or otherwise too
> large to iterate over.  Maxent models on 'small' sample spaces are
> common in natural language processing; models on 'large' sample spaces
> are useful for channel modelling in mobile communications, spectrum and
> chirp analysis, and (I believe) fluid turbulence.  Some simple examples
> are in the examples/ directory.  The simplest use is to define a list of
> functions f, an array of desired expectations K, and a sample space, and
> use the commands
>
> >>> model = maxent.model(f, samplespace)
> >>> model.fit(K)
>
> You can then retrieve the fitted parameters directly or analyze the
> model in other ways.
>
> I've been developing the maxent algorithms and code for about 4 years.
> The code is very well commented and should be straightforward to maintain.
>
>
> The montecarlo package currently does only one thing.  It generates
> discrete variates from a given distribution.  It does this FAST.  On my
> P4 it generates over 10^7 variates per second, even for a sample space
> with 10^6 elements.  The algorithm is the compact 5-table lookup sampler
> of Marsaglia.  The main module, called 'intsampler', is written in C.
> There is also a simple Python wrapper class around this called
> 'dictsampler' that provides a nicer interface, allowing sampling from a
> distribution with arbitrary hashable objects
> (e.g. strings) as labels instead of {0,1,2,...}.  dictsampler has
> slightly more overhead than intsampler, but is also very fast (around
> 10^6 per second for me with a sample space of 10^6 elements labelled
> with strings).  An example of using it to sample from this discrete
> distribution:
>
>     x       'a'       'b'       'c'
>     p(x)    10/180    150/180   20/180
>
> is:
>
> >>> table = {'a':10, 'b':150, 'c':20}
> >>> sampler = dictsampler(table)
> >>> sampler.sample(10**4)
> array([b, b, a, ..., b, b, c], dtype=object)
>
> The montecarlo package is very small (and not nearly as impressive as
> Christopher Fonnesbeck's PyMC package), but the functionality that is
> there would be an efficient foundation for many discrete Monte Carlo
> algorithms.
>
> I'm aware of the build issue Travis Brady reported with MinGW not
> defining lrand48().  I can't remember why I used this, but I'll adapt it
> to use lrand() instead and report back.
>
>
> Would these packages be useful?  Are there any objections to including them?
>
>
> -- Ed
>
>
>
>
> _______________________________________________
> Scipy-dev mailing list
> Scipy-dev at scipy.net
> http://www.scipy.net/mailman/listinfo/scipy-dev
>




More information about the Scipy-dev mailing list