[Numpy-discussion] [ANN]: Taylorpoly, an implementation of vectorized Taylor polynomial operations and request for opinions

Sebastian Walter sebastian.walter@gmail....
Sun Feb 28 16:52:50 CST 2010


On Sun, Feb 28, 2010 at 9:06 PM, Friedrich Romstedt
<friedrichromstedt@gmail.com> wrote:
> 2010/2/28 Sebastian Walter <sebastian.walter@gmail.com>:
>>> I think I can use that to make my upy accept arbitrary functions, but
>>> how do you apply sin() to a TTP?
>>
>> perform truncated Taylor expansion of  [y]_D = sin([x]_D), i.e.
>> y_d = d^d/dt^d  sin( \sum_{k=0}^{D-1} x_k t^k) |_{t=0}
>> to obtain an explicit algorithm.
>>
>>>
>>> One more question: You said, t is an "external paramter".  I, and
>>> maybe not only me, interpreted this as a complicated name for
>>> "variable".  So I assumed it will be a parameter to some method of the
>>> TTP.  But it isn't?  It's just the way to define the ring?
>
> I guess you overlooked this question?

thought this is a rhetorical question.
Tbh I don't know what the standard name for such a "formal variable" is.

>
>>> You could
>>> define it the same in Fourier space, except that you have to make the
>>> array large enough from the beginning?  Why not doing that, and
>>> saying, your computation relies on the Fourier transform of the
>>> representation?  Can this give insight why TTPs are a ring and why
>>> they have zero divisors?
>> it has zero divisors because for instance multiplication of the two
>> polynomials t*t^{D-1}
>> truncated at t^{D-1} yields is zero.
>
> Yes, but I wanted to have a look from Fourier space of view.  Because
> there everything is just a multiplication, and one does not have to
> perform the convolution in mind.  I have to give up here.  In fact, I
> do not really understand why my approach also works with DFT and not
> only analytically with steady FT.  Consider the snippet:
>
>>>> p1 = gdft_polynomials.Polynomial([1])
>>>> p1.get_dft(3)
> array([ 1.+0.j,  1.+0.j,  1.+0.j])
>>>> p2 = gdft_polynomials.Polynomial([0, 1])
>>>> p2.get_dft(3)
> array([ 1.0+0.j       , -0.5+0.8660254j, -0.5-0.8660254j])
>>>> p2.get_dft(4)
> array([  1.00000000e+00 +0.00000000e+00j,
>         6.12303177e-17 +1.00000000e+00j,
>        -1.00000000e+00 +1.22460635e-16j,  -1.83690953e-16 -1.00000000e+00j])
>
> As one increases the number of padding zeros, one increases Fourier
> space resolution, without affecting result:
>
>>>> p3 = gdft_polynomials.Polynomial([0, 1, 0, 0, 0])
>>>> print p2 * p2
> Polynomial(real part =
> [  1.85037171e-16  -7.40148683e-17   1.00000000e+00]
> imaginary part =
> [  2.59052039e-16   7.40148683e-17   0.00000000e+00])
>>>> print p2 * p3
> Polynomial(real part =
> [  1.66533454e-16   1.48029737e-16   1.00000000e+00  -7.40148683e-17
>  -4.44089210e-16  -3.70074342e-17]
> imaginary part =
> [  9.25185854e-17   1.48029737e-16   2.96059473e-16   1.11022302e-16
>  -3.70074342e-16  -1.44497045e-16])
>>>>
>
> It's a bit of mystery to me.  Of course, one can argue, well, DFT is
> information maintaining, and thus one can "feel" that it should work,
> but this is only a gut feeling.

I'm of no help here, I'm not familiar enough with the DFT. All I know is that
F( conv(x,y)) = F(x) * F(y) and that one can speed up the convolution
in that way.
And most operations on truncated Taylor polynomials result in
algorithms that contain convolutions.


>
>>>>  E.g. if you have a function
>>>> f: R^N -> R
>>>> x -> y=f(x)
>>>> where x = [x1,...,xN]
>>>>
>>>> and you want to compute the gradient g(x) of f(x), then you can compute
>>>> df(x)/dxn by propagating  the following array of Taylor polynomials:
>>>>
>>>> x = numpy.array( UTPS([x1_0, 0]), ..., UTPS([xn_0, 1]), ...,
>>>> UTPS([xN_0,0]), dtype=object)
>>>> y = f(x)
>>>
>>> So what is the result of applying f to some UTPS instance, is it a
>>> plain number, is an UTPS again?  How do you calculate?
>>>
>>> Can one calculate the derivative of some function using your method at
>>> a certain point without knowledge of the analytical derivative?  I
>>> guess that's the purpose.
>> Yes, that's the whole point: Obtaining (higher order) derivatives of
>> functions at machine precision for which no symbolic representation is
>> readily available.
>> That includes computer codes with recursions (e.g. for loops) that are
>> a no-go for symbolic differentiation. Supposedly (I've never done
>> that) you can even differentiate Monte Carlo simulations in that way.
>
> http://en.wikipedia.org/wiki/Automatic_differentiation:
> "Both classical methods have problems with calculating higher
> derivatives, where the complexity and errors increase. Finally, both
> classical methods are slow at computing the partial derivatives of a
> function with respect to many inputs, as is needed for gradient-based
> optimization algorithms. Automatic differentiation solves all of these
> problems."
>
> Yeah.
>
> Note at this point, that there is no chance for your project to be
> integrated in scipy, because you maybe HAVE TO PUBLISH UNDER GPL/CPAL
> (the ADOL-C is licensed GPL or CPAL).  I cannot find CPL on
> www.opensource.org, but I guess it has been renamed to CPAL?  Anyway,
> CPAL looks long enough to be GPL style ;-).  I also published my
> projects under GPL first, and switched now to MIT, because Python,
> numpy, scipy, matplotlib, ... are published under BSD kind too, and in
> fact I like MIT/BSD more.  Please check if your aren't violating GPL
> style licenses with publishing under BSD style.

AFAIK the CPL is basically the Eclipse Public License (
http://en.wikipedia.org/wiki/Eclipse_Public_License)
which explicitly allows software under another licence to link to CPL code.
The Python wrapper of ADOL-C is only linking to ADOL-C thus it can be
BSD licensed.
How is this related to the Taylor polynomials?

>
>>>>  E.g. if you have a function
>>>> f: R^N -> R
>>>> x -> y=f(x)
>>>> where x = [x1,...,xN]
>>>>
>>>> and you want to compute the gradient g(x) of f(x), then you can compute
>>>> df(x)/dxn by propagating  the following array of Taylor polynomials:
>>>>
>>>> x = numpy.array( UTPS([x1_0, 0]), ..., UTPS([xn_0, 1]), ...,
>>>> UTPS([xN_0,0]), dtype=object)
>>>> y = f(x)
>
> But doesn't the call f(x) with x.shape = (N,) result in an array too?
> But you want a scalar number?

Since x[0] is a scalar and not a 0-dimensional ndarray I conjecture
that a function mapping to the real numbers
should be implemented to return a scalar, not an array.

>
>>>> if you want to have the complete gradient, you will have to repeat N
>>>> times. Each time for the same zero'th coefficients [x1,...,xN].
>>>>
>>>> Using the vecorized version, you would do only one propagation
>>>> x = numpy.array( UTPS([x1_0, 1,0,...,0]), ..., UTPS([xn_0,
>>>> 0,...,1,...0]), ..., UTPS([xN_0,0,....,1]), dtype=object)
>>>> y = f(x)
>>>>
>>>> i.e. you save the overhead of calling the same function N times.
>>>
>>> Ok, I understand.  Today it's too late, I will reason tomorrow about it.
>
> I think I grasped the idea.  But the thing is really tricky.
>
> I thought:
> So UTPS is not the thing you implemented, but you implemented rather
> the complete array.  Right?

I don't get your question.

> But it's maybe wrong:
>
> UTPS([x1_0, 1, 0, ..., 0]) is with D = 1 and P = N (f: R^N -> R).
> I.e., P = N polynomials of degree 1, for calculating the first-order
> derivative?  That's why your question (1) from Feb 27: What to hand
> over?  I would say, make it possible to hand over an (P, N) ndarray.
> It will increase impact of your module (and graspableness)
> dramatically.  And you have an indication how to interpret the array
> handed over without additional init args to UTPS().

Well, there is a problem with the (P,D) approach.
What if there is an comparison operator, e.g. [x]_D > 0 in your code?
It is defined as x_0 > 0. Now, if a single UTPS contains P zero
coefficients x_{10},...,x_{P0} then at this point the computer program
would have to branch. Hence there must be only one x_0 in an UTPS
instance.

If it wasn't for this branching I'd implemented it as (P,D) array.
I thought about providing the possibility to the user to use an (P,D)
array as input.
However, this would mean to hide crucial information about the inner
workings of the algorithms from the users.
This is almost always a very bad idea.
Also, it would violate the "only one way to do it" principle of Python.
As Einstein said: Make things as simple as possible, but not any simpler ;)

But it's good that you point that out. I agree that it would be nice
to be more elegant.

>
> I think the nominal value is always stored in x_{i, 0}, am I wrong?
there is only one zero'th coefficient x_0 for all directions P.
>
> I'm not shure what to use as initial UTPSs.  Can you make it possible
> that one doesn't have to think about that?  What would be great, if I
> have a target function f(a, b, c, ...) stored, to hand over instead of
> ordinary numbers objects from your package, and everything works out
> such that I end up in the result with an object where both the nominal
> value is stored as also the gradient.  Is that feasible?  You could
> also monkey-patch numpy.sin etc. by a replacement calling the original
> numpy.sin with the nominal values but also doing the ttp job.

The taylorpoly module is supposed to be a basic building block for AD
tools and other software that relies on local polynomial
approximations (e.g. Taylor series intergrators for ODEs/DAEs).
There are quite a lot of AD tools available in Python. I use pyadolc
on a regular basis and it turned out to be very reliable so far. It is
also quite fast (probably factor 100 faster than pure Python based AD
tools).

numpy.sin(x) is smart enough to call x.sin() if x is an object.

>
>>> I guess it's not overhead.  The number of calculations should be in
>>> equilibrium at very low D, am I wrong?  And you win to not have to
>>> compile a C library but only native text Python code.
>>
>> Well, I'm really no expert on the DFT. But doesn't the  DFT compute on
>> the complex numbers? you'll have extra overhead (let's say factor >=
>> 2?)
>> And as far as I can tell,  you do the computations on padded arrays
>> which possibly introduces cache misses (maybe another factor 2?)
>
> What are "cache misses"?
now that I think of it, the size of the polynomials are much to small
not to fit into the cache...
A cache miss occurs when a required memory block for the next
operation on the CPU is not in the cache and has to be transfered from
the main memory. Getting from the main memory is a process with high
latency and relatively low bandwidth. I.e. the cpu is waiting, i.e.
your algorithm operates significantly below peak performance.
But anyway, forget about it. In this case it should be irrelevant.

>
>> Isn't the advantage of the DFT not that you can use the FFT which
>> would reduce the runtime from O(D^2) to O(D log(D))?
>> I'm pretty sure that only pays off for D larger than 10.
>
> Your algorithm stays at O(D^2) as you do the convolution by hand, no?
Yes, its O(D^2).
>
>>> E.g., your
>>> optimisation package is quite interesting for me, but I'm using
>>> Windows as my main system, so it will be painful to compile.  And the
>>> code is more straightforward, more concise, easier to maintain and
>>> easier to understand, ... :-)  I really do not want to diminish your
>>> programming skills, please do not misunderstand!  I only mean the
>>> subject.
>> The project uses scons which is available for windows as binaries.
>> I haven't tried it myself but I'm confident that it's a 1 minutes job
>> on windows.
>
> The optimisation package or utp?  I want to give utp a try.

to what do you refer to by "optimization package"?

>
>> I have implemented some of the algorithms just as you explained in
>> another package
>> (http://github.com/b45ch1/algopy/blob/master/algopy/utp/utps.py).
>> But I don't think the code looks easier to maintain than the C code
>> and it's also slower.
>
> Can you explain which repo I should clone at best?  What are the
> differences between algopy, taylorpoly and pyadolc?

pyadolc is the wrapper of the very mature C++ software ADOL-C. It's
quite fast and reliable.

algopy is a research prototype with focus on higher order
differentation of linear algebra functions (dot, inv, qr, eigh, trace,
det, ...). This is done by propagation of univariate Taylor
polynomials with numpy.arrays as coefficients.

taylorpoly is supposed to provide building blocks. E.g. I plan to use
these algorithms also in algopy.
I hope that the software I write is deemed valuable enough to be
included into scipy or preferably numpy.

>
> I'm getting a bit confused slowly with all the mails in this thread,
> and also the subject isn't that easy ...
> http://en.wikipedia.org/wiki/Automatic_differentiation also refers to
> TTP only in a marginal note :-(
One could/should improve the wikipedia article, I guess.
>
> Friedrich
> _______________________________________________
> NumPy-Discussion mailing list
> NumPy-Discussion@scipy.org
> http://mail.scipy.org/mailman/listinfo/numpy-discussion
>


More information about the NumPy-Discussion mailing list