[SciPy-dev] fblas tests

eric eric at scipy.org
Sat Apr 13 21:09:43 CDT 2002


Hey Pearu,

I was looking over the behavior of the new fblas wrappers and think we should
make some changes back to how the previous version worked (for reasons explained
below).  The old test suite is pretty extensive and covers many of the various
calling approaches that can occur.  We should definitely try to get all of its
tests to pass, as they exercise the interface fairly well.

The current blas wrappers sacrifice performance in an effort to make the
functions more Pythonic in there behavior and also reduce the chance of operator
error.  However, I consider the fblas and cblas very low level and aimed at
maximum performance.  If people are using them, they should have access to the
maximum speed that the underlying library can provide.  This means there
shouldn't be any extra copying if we can help it.  The result may be a less
"Pythonic" interface, but that is OK.  Pythonic approaches already exist for
most of the problems.  Consider scopy.  Someone wanting to use Pythonic
approaches for making strided copies from one array to another will use:

    >>> a[:8:4] = b[:6:3]

instead of:

    >>> fblas.scopy(a,b,n=2,incx=4,incy=3).

The person choosing the second would do so only for speed.  Also, they would
expect the values of b to get changed in the operation since scopy moves
portions of a into portions of b.  The current wrapper doesn't do this.  It
makes a copy of b, copies the values of a into it, and then returns this array.
It leaves b unaltered.

    >>> x = arange(8.,typecode=Float32)*10
    >>> y = arange(6.,typecode=Float32)
    >>> x
    array([  0.,  10.,  20.,  30.,  40.,  50.,  60.,  70.])
    >>> y
    array([ 0.,  1.,  2.,  3.,  4.,  5.],'f')
    # An output array is created with the changes to y instead of changing y in
place.
    >>> fblas.scopy(x,y,n=2,incx=4,incy=3)
    array([  0.,   1.,   2.,  40.,   4.,   5.],'f')
    # This guy should have been changed inplace
    >>> y
    array([ 0.,  1.,  2.,  3.,  4.,  5.],'f')

I did a comparison of this function with the Pythonic approach for very large
arrays and it was actually slower.  I removed the "copy" fromt the
"intent(in,copy,out) y" poriton of the interface definition, and scopy becomes
noticably faster (1.5 or more speed up) over the indexed copy approach.  It also
injects its values directly into b, so it removes the extra memory allocation
and copying.  Here is the output from my test:

C:\home\ej\wrk\scipy\linalg\build\lib.win32-2.1\linalg>python copy_test.py
python:  0.0879670460911
scopy -- without copy to output:  0.0521141653478
scopy -- with copy to output:  0.157154051702

I've included the script below.

Using this new approach, non-contiguous arrays passed into y will result in
exceptions.  That is OK though.  I think the experts that will use these
functions would rather force these functions to require contiguous input than to
have them handle the non-contiguous case at the expense of optimal performance.

So, I'm gonna work my way through the interface and try to switch the behavior
back to the old approach.  I think it only requires removing the "copy" argument
from most of the arguments.  Let me know if you think of other things I should
change.  Also, let me know if there are other pitfalls I'm not thinking of here.

thanks,
eric

---------------------
import time
from scipy_base import *
import fblas
import scipy.linalg.fblas

its = 1
l= 250000.
#l= 10.
x_size = 8.*l
y_size = 6.*l

x = arange(x_size)*10.
x = x.astype(Float32)
n = len(x) / 4.
y = arange(y_size,typecode=Float32)

t1 = time.clock()
for i in range(its):
    y[::3] = x[::4]
t2 = time.clock()
print 'python: ', t2 - t1
r_python = y.copy() # should really be y, but interface is different than
expected.

#-----------------------------------------------------------------------------

x = arange(x_size)*10.
x = x.astype(Float32)
n = len(x) / 4.
y = arange(y_size,typecode=Float32)

t1 = time.clock()
for i in range(its):
    z = fblas.scopy(x,y,n=n,incx=4,incy=3)
t2 = time.clock()
print 'scopy -- without copy to output: ', t2 - t1
r_nocopy = y.copy()

#-----------------------------------------------------------------------------

x = arange(x_size)*10.
x = x.astype(Float32)
n = len(x) / 4.
y = arange(y_size,typecode=Float32)

t1 = time.clock()
for i in range(its):
    z = scipy.linalg.fblas.scopy(x,y,n=n,incx=4,incy=3)
t2 = time.clock()
print 'scopy -- with copy to output: ', t2 - t1
r_copy = z.copy() # should really be y, but interface is different than
expected.

#-----------------------------------------------------------------------------

#print 'y:', y
#print 'z:', z
#print 'nocopy:', r_nocopy
#print 'python:', r_python
#assert(alltrue(r_nocopy == r_python))

# This illustrates why the copy is needed for a "robust" interface.
#x = arange(8.*100000.)*10.
#x = x.astype(Float32)
#n = len(x) / 4
#yy = arange(12.,typecode=Float32)
#z = fblas.scopy(x,y[::2],n=2,incx=4,incy=3)
#print 'y:', yy
#print 'z:', z

--
Eric Jones <eric at enthought.com>
Enthought, Inc. [www.enthought.com and www.scipy.org]
(512) 536-1057






More information about the Scipy-dev mailing list