[SciPy-dev] Generic polynomials class (was Re: Volunteer for Scipy Project)
Charles R Harris
Wed Oct 14 10:08:10 CDT 2009
On Tue, Oct 13, 2009 at 10:52 PM, <firstname.lastname@example.org> wrote:
> On Tue, Oct 13, 2009 at 10:38 PM, Charles R Harris
> <email@example.com> wrote:
> > On Tue, Oct 13, 2009 at 7:24 PM, Anne Archibald <
> > wrote:
> >> 2009/10/13 Charles R Harris <firstname.lastname@example.org>:
> >> >
> >> >
> >> > On Tue, Oct 13, 2009 at 5:55 PM, Anne Archibald
> >> > <email@example.com>
> >> > wrote:
> >> >>
> >> >> 2009/10/13 Charles R Harris <firstname.lastname@example.org>:
> >> >> >
> >> >> >> I'm not sure I see why returning all those NotImplementedErrors is
> >> >> >> problem - yes, the code needs to be written to override the
> >> >> >> that return NotImplementedError, but that code needs to be written
> >> >> >> no
> >> >> >> matter how we set up an interface to it.
> >> >> >
> >> >> > It's not trivial to do a complete workup. My template class is 698
> >> >> > lines
> >> >> > long, but it exists once in one place, and that does the trick for
> >> >> > *all*
> >> >> > the
> >> >> > classes.
> >> >>
> >> >> Aha. I think I see where the confusion is. In my proposed interface,
> >> >> the coefficient-oriented functions would *be* the ones in the Basis
> >> >> object. So, for example,
> >> >>
> >> >> chebadd(c1,c2,interval) -> ChebyshevBasis(interval).add(c1, c2)
> >> >>
> >> >> or, more likely, you'd have somewhere earlier
> >> >>
> >> >> cheb = ChebyshevBasis(interval)
> >> >>
> >> >> and then it's
> >> >>
> >> >> chebadd(c1,c2,interval) -> cheb.add(c1,c2)
> >> >>
> >> >> So the coefficient-oriented interface that we need to have is what
> >> >> lives in the basis objects, and it is what the Polynomial object uses
> >> >> to implement (e.g.) p1*p2. There would be no separate chebmul,
> >> >> polymul, lagrangemul, ....
> >> >>
> >> >
> >> > But the separate functions are what we want. They provide the most
> >> > flexible
> >> > low level implementation and make the fewest assumptions as to what
> >> > programmer might want to use them for. You don't really have a base
> >> > class,
> >> > you essentially have classes derived from different base classes,
> >> > isn't so different from what I am proposing.
> >> Do we really want the separate functions, rather than methods on a
> >> basis object? Why? I don't really see how they're more flexible or
> >> make fewer assumptions, and they become very cumbersome (and prevent
> >> some important optimizations) when working with polynomials in the
> >> Lagrange basis. It really can be the difference between chebadd(...)
> >> and cheb.add(...), in the (common?) case where people aren't
> > But why do that if you already have the name space when you import the
> > module? It's redundant.
> >> specifying the interval. And if people do want to specify the
> >> interval, using a basis object makes it much easier to avoid
> >> forgetting it and getting nonsensical results (or worse, sensible
> >> results until you try a different interval).
> > That is because they are low level building blocks, they should have the
> > absolute minimum of crud stuck on, and that means the interval is [-1,
> > If you need more, that is what a class is for. That how low level things
> > should work, small functions doing the minimum amount provide the most
> > flexibility. The programmer should be handed enough rope to hang
> > at that level if they so choose. That's why folks use C and not Pascal.
> >> In this respect, the current interface in chebyshev.py looks
> >> error-prone to me: you can't suppy an interval to chebadd and chebmul,
> >> but if you forget it for chebval you get bogus answers. (And in fact
> >> chebint and chebder are wrong, or at least misleading, if you use a
> >> different interval.)
> > Tracking things like intervals is best delegated to higher level
> > where you can deal with it without worrying about the low level
> > implementation at the same time. Of course there is some loop-de-loop
> > involved to get to the ideal dividing line, but it is important to draw
> > line.
> >> I'm not sure what you mean when you say I don't really have a base
> >> class: Basis is literally the base class for all Basis objects, and it
> >> provides nontrivial functionality (e.g. the power implementation) that
> >> would otherwise need to be repeated for each representation.
> >> Polynomial is the base class for all polynomials.
> >> >> The Basis object is effectively just a namespace holding the
> >> >> collection of functions, and possibly any data they need (interval,
> >> >> for example). So the separation you're describing is still present in
> >> >> my proposed interface; it's the separation between Basis objects
> >> >> (which contain the functions but no particular plumbing) and the
> >> >> Polynomial objects (which are all plumbing).
> >> >
> >> > Why not let the module hold the functions? It is the more natural
> >> > namespace
> >> > for python.
> >> Because there is important context - intervals, lists of specified
> >> points - that must be provided to all the coefficient-array functions.
> >> And collections of methods on objects are very natural in python.
> > That is higher level stuff and doesn't belong at the bottom, it should be
> > separate IMHO.The functions are the ingredients, not the cake. I've spent
> > too much time pulling apart classes to get to the reusable bits not to
> > them separated out up front. Now you could start with the separate
> > and then construct a basis to pass into the initializer. At that point
> > and I are operating along the same lines except I don't pass them to the
> > contructor, I put them in the local environment where the class methods
> > find them, that's what the code completion does and Python handles that
> > efficiently. They are consequently more like static (class) methods than
> > instance methods. That way I end up with a whole new class that uses
> > functions, inherits from nothing but object, and has a minimal
> > I'll admit that my class implementation may not be suitable for some of
> > other types of polynomials you are looking at, I hadn't been thinking
> > them yet. On the other hand it is a pretty complete implementation that
> > requires nothing more than the functions, raises no NotImplementedErrors
> > should work with most of the orthogonal polynomial bases. And it can be
> > reworked without reference to the low level functions.
> How expensive can _cseries_to_zseries, as_series and similar be?
> Since you currently have all main calculations in self-contained
> functions, you need to convert and check the inputs each time. If
> these were expensive intermediate results, then methods could save
> time in this. For short series, this doesn't seem to be a problem with
> polycheb, but inside a loop I'm not so sure.
> (as an aside: we were also thinking about moving some methods in
> statsmodels to functions, but in many cases we would need a larger
> number of function arguments of intermediate results or duplicate
> calculations, which makes it less useful.)
One reason for the low level functions is that they are short and can be
easily converted to cython functions or be library functions written in
fortran or whatever. The z series could be done that way, but at the cython
level they could probably be dispensed with and the underlying Cheybyshev
series used instead, that's one reason they are private functions. What they
brought to the python case was some vectorization of the algorithms. Another
way to look at them is as symbolic Fourier transforms.
Finding the proper collection of small routines can be hard work, which
seems like it might be the case in the statsmodels. Sometimes profiling will
reveal small bits that are bottlenecks and they are potential candidates.
Another view that helps is to separate the data shuffling and verification
to the python level and think of that as an interface to some underlying
routines in a faster language. I don't know enough about statsmodels to make
any more useful suggestions ;)
> I was playing a bit with the cats, and inheritance seems to work
> without problems.
Things can be done with inheritance, but in this case the need was for a
common mass produced interface and the desire to avoid work. Inheritance
isn't the only way to build interfaces, unless it happens to be the only
tool at hand. Templated classes are another way that is closer to the duck
typing model and they seem to work better in some circumstances. For
instance, in the C++ STL templated classes are provided so that you can
have, say, lists of different (uniform) types, but the lists all have the
same methods. That works better than trying to use inheritance. I think that
in general the templated class method is preferable because the design is
flatter and more flexible, and it doesn't use hokey virtual base classes and
such. Inheritance works best when you need to call a bunch of different
objects through a common base class, i.e., the objects are all "is a" base
class, and I don't think that requirement is there for the polynomial
classes. The temptation of inheritance is to use it for implementation, and
more often than not that leads to trouble down the line. The upshot is that
I was looking for the python equivalent of templated classes and completions
looked to be the best bet, so I gave it a shot to see how it looked. I think
it looks OK, but the functional approach is a bit strange to folks like
myself who have spent most of our lives working with imperative languages.
Following roughly the discussion, I would find it useful if the
> functions specify the used or assumed domain in the doc strings. Since
> I'm not very familiar with Chebyshev polynomials, I wouldn't know
> which functions would require rescaling of the domain.
Yes, that should be documented.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Scipy-dev