[SciPy-dev] genetic algorithm, number theory, filter design,zero finding

eric eric at scipy.org
Tue Apr 9 15:09:56 CDT 2002


>
> Hi,
>
> On Tue, 9 Apr 2002, Chuck Harris wrote:
>
> > Good enough. Would packaging things up with distutils be a good way to
> > go?
>
> Yes. In fact it is the only way to go. However, you may want to look how
> packaging is done in scipy for its different submodules. Scipy uses
> scipy_distutils that is derived from distutils. For more information, see
>
>   http://www.scipy.net/pipermail/scipy-dev/2002-January/000162.html

If you set things up in this format, it'd be great.  However, it should be
little work for one of us to convert a working standard setup.py file.
>
> > Also, are there any documentation guidelines? I assume that more
> > might be wanted than goes in """...""".
>
> Not that I know of. We need to work out these guidelines. I think we
> should also consider PEP 287 while doing this:
>
>   http://www.python.org/peps/pep-0287.html
>

I very much agree with this.  There are perhaps some short coming to this format
(some extra vertical whitespace), but otherwise, it looks well thought out.  We
struggled some with a standard format in the beginning, and never really came up
with one.  Initially we wanted to specify a reasonably long format for
doc-strings.  Something like (slightly modified from a current doc string in
scipy):

def vq(obs,code_book):
    """ Vector Quantization: assign features sets to codes in a code book.

        Description:
            Vector quantization determines which code in the code book best
            represents an observation of a target.  The features of each
            observation are compared to each code in the book, and assigned
            the one closest to it.  The observations are contained in the obs
            array. These features should be "whitened," or nomalized by the
            standard deviation of all the features before being quantized.
            The code book can be created using the kmeans algorithm or
            something similar.
        Arguments:
            obs -- 2D array.
                    Each row of the array is an observation.  The
                    columns are the "features" seen during each observation
                    The features must be whitened first using the
                    whiten function or something equivalent.
            code_book -- 2D array.
                    The code book is usually generated using the kmeans
                    algorithm.  Each row of the array holds a different
                    code, and the columns are the features of the code.
                                    #   c0    c1    c2   c3
                        code_book = [[  1.,   2.,   3.,   4.],  #f0
                                     [  1.,   2.,   3.,   4.],  #f1
                                     [  1.,   2.,   3.,   4.]]) #f2
        Outputs:
            code -- 1D array.
                    If obs is a NxM array, then a length M array
                    is returned that holds the selected code book index for
                    each observation.
            dist -- 1D array.
              The distortion (distance) between the observation and
              its nearest code
        Caveats:
           This currently forces 32 bit math precision for speed.  Anyone know
           of a situation where this undermines the accuracy of the algorithm?
        Example:
            >>> code_book = array([[1.,1.,1.],
            ...                    [2.,2.,2.]])
            >>> features  = array([[  1.9,2.3,1.7],
            ...                    [  1.5,2.5,2.2],
            ...                    [  0.8,0.6,1.7]])
            >>> vq(features,code_book)
            (array([1, 1, 0],'i'), array([ 0.43588989,  0.73484692,
0.83066239]))

    """

This is great if you can do it (although this description could use some
help...).  However, I noticed when I was writing docs that I much preferred to
write (and was more likely to write...) a narrative form with a short
description and then pretty much everything else lumped into a paragraph or two.
Judging by everyone else's docstrings in SciPy, they like to write the narrative
form better also.  All that said, don't let me discourage anyone from using
something like the above format (converted to be reStructureText compatible).  I
think the users will appreciate it.  Its very nice to type:

    >>> froms scipy.cluster.vq import vq
    >>> help(vq)
    <the above docstring spills out>

and get a description, easy to read input/output information, and an example of
use.

I really wish we had more examples in functions.  These should be in doctest
format so that examples can be automatically tested for correctness.

Would anyone be against adopting PEP 287 as the standard for docstrings in
SciPy?  happydoc and other tools are gonna support it, so we'll be able to
generate a reference manual with relative ease (eventually).  I defintely don't
want to invent some new markup for this project that we have to maintain.  There
is more than enough maintance here already, thank you.

Oh, one other thing.  I prefer the following indention for doc-strings:

    def foo():
        """ initial description

            more description
        """

Instead of:

    def foo():
        """initial description

        more description
        """

I think it looks a little better, and has the added benefit that scintilla based
editors can fold the comments up (which is does based on indentation).

>
> > For testing, I'm guessing that there might be two kinds: first, to
> > check that the routines operate correctly on a given platform and
> > second, to check that the routines do what they are supposed to do.
>
> I don't see the difference.
>
> > Are there any testing guidelines?
>
> Yes, there are. See
>
>   http://www.scipy.org/site_content/tutorials/testing_guidelines
>
> These guidelines hold for most of its parts except scipy_test module
> part that is now moved to scipy_base/testing.py
>
> And note that all guidelines are a bit outdated due to fast development of
> scipy. If something seems to be inconcistent or is not working according
> to these guidelines then the best place to look is the source of scipy
> modules. And of course, scipy-dev can be helpful.

I can't stress the importance of the unit tests enough.  With the zillions of
platforms combinations that people want to run SciPy on, they are the only
prayer to validating that algorithms work.  After 0.2, I'm thinking the next
release will mainly be used to add tests and improve documentation.  It be great
to have full coverage ( for some definition of "full").

>
> > > > Number Theory : These were used for analysing arrays of
> > > antennas used
> > > > in radar interferometry. They are also useful in integer
> > > programming,
> > > > cryptography, and computational algebra.
> > > >
> > > > Reduction of a matrix to Hermite normal form
> > > > Reduction of a matrix to Smith normal form
> > > > LLL basis reduction
> > > > LLL basis reduction - deep version
> > > > Gram-Schmidt orthogonalization
> > >
> > > I am not sure where these should go when considering the current scipy
> > > state. As you mention, they are parts of different fields from what we
> > > have in scipy now. I think they should be parts of the corresponding
> > > packages that may not exist as of yet. Personally, I am very
> > > interested in
> > > CA stuff.
> >
> > These routines really need exact rational arithmetic in the general
> > case. The floating point versions here just happen to be 'good enough'
> > in many situations. Are there any plans to bring exact arithmetic into
> > scipy or numpy?
>
> I don't think that numpy is a proper place for this.
> And I don't think that anything useful should be under one
> umbrella such as SciPy.
> What I think is that Computational Algebra in Python should be a separate
> project.
>
> I have been experimenting with CA for Python almost two years trying
> out various approaches. Using Python 2.2 one can do serious symbol
> manipulations but pure Python implementation seems to be impractical -
> Python is too slow. Currently the most appropiate CA library seems to be
> GiNaC (www.ginac.de) to be wrapped to Python. I have already done that but
> the project is currently freezed until Boost.Python V2 becomes usable
> so that Python 2.2 new features can be fully exploited.
>
> But if only exact arithmetic is needed for your modules then gmpy is the
> best thing to consider because it wraps a very fast GMP library and it
> can be made available also for Windows platforms.
>
> > > > Zero finders: General use and fun. Bisection best for some special
> <snip>
> > I took a quick look at the Fortran code for the version of solve in
> > the minimization package. It seems to be a multidimensional form of
> > Newton's method, really the only way to go in higher dimensions unless
> > the function is holomorphic. The routines here are for the one
> > dimensional case and should be more robust in this situation. If any
> > one has a more general idea of what goes on in the current zero
> > finder, I would like to hear about it.
>
> Some comments about Powell's hybrid method used by minpack/hybrd.f can be
> found, for example, in
>
>   http://www.empicasso.com/techdocs/p14_2.pdf
>
> > > > Genetic algorithm: Used in digital filter design to optimize for
> > > > coefficient truncation error. I looked at galib and found
> > > it easier to
> > > > roll my own, but didn't try for any great generality. I
> > > think it would
> > > > be good to include uniform crossover and to pull the
> > > fitness function
> > > > out of the genome --- in a tournament, fitness can depend on the
> > > > population. Perhaps it can all be made simpler.
> > >
> > > galib seems to be developed more than 6 years and I would
> > > expect it to be
> > > rather mature, though, I have not used it myself. May be a
> > > wrapper to such a library would be more appropiate for a longer term.
> > > Though the licence may be an issue, galib seems to be GPL compatible.
>
> > I was sort of hoping for a ga guru to comment.
>
> OK. Do we have one in this list?
> I made my comment from the SciPy point of view as in addition to possible
> license issues we must be careful when including new software to scipy: it
> must be most efficient available and it most be actively maintained
> preferably by the specialists of the field, and also in future.

I'm pretty comfortable with GAs.  My dissertation was on GAs for antenna and
circuit design.  The genetic algorithm in scipy (scipy.ga) was used for this
work.  I started with Matthew Wall's galib which is a very good C++ library.
Wrapping it was actually my introduction to Python extensions and SWIG.  It
worked fine, but the type checking of C++ quickly becomes a pain with GAs, so I
wrote scipy.ga as a replacement.  It shares many ideas with galib, but with a
central difference in that a gene is "atomic" building block instead of the
genome.  This is slower in general, but provides much more flexibility (mixing
all kinds of gene types within a single genome).  For the problems I was
interested in, the fitness function swamped the computational cost of the GA, so
speed wasn't an issue.  Specialized genomes could be made faster if people need
that.

scipy.ga is reasonably full featured.  The main feature I think is missing is
pareto optimization stuff (and there are the beginnings of this).  I'm sure
other things are needed (more crossover, selection, etc.) options, but they are
generally easy to add.

Right now, scipy.ga suffers from a lack of documentation and a lack of
attention.  It still looks pretty much like my research code.  While very
workable, it could definitely use an inter"face lift".  Thats on the "todo" list
but hasn't been a priority as other things have seemed more important.  I doubt
this or even the next release will give it much attention.  Hopefully it'll get
cleaned up for 0.4 or 0.5.

eric






More information about the Scipy-dev mailing list