[SciPy-Dev] Seeking help/ advice for applying functions

Anne Archibald peridot.faceted@gmail....
Tue Mar 9 12:12:53 CST 2010


On 9 March 2010 09:59, eat <e.antero.tammi@gmail.com> wrote:
> Robert Kern <robert.kern <at> gmail.com> writes:
>
>>
>> Your example is not very clear. Can you write a less cryptic one with
>> informative variable names and perhaps some comments about what each
>> part is doing?
>>
>
> """
> Hi,
>
> I have tried to clarify my code. First part is the relevant one, the rest is
> there just to provide some context to run some tests.

The short answer is that no, there's no way to optimize what you're doing.

The long answer is: when numpy and scipy are fast, they are fast
because they avoid running python code: if you add two arrays, there's
only one line of python code, and all the work is done by loops
written in C. If your code is calling many different python functions,
well, since they're python functions, to apply them at all you must
necessarily execute python code. There goes any potential speed
advantage. (There may be a convenience advantage; if so, you can look
into using np.vectorize, which is just a wrapper around a python loop,
but is convenient.)

That said, I assume you are considering numpy/scipy because you have
arrays of thousands or more. It also seems unlikely that you actually
have thousands of different functions (that's an awful lot of source
code!). So if your "different" functions are actually just a handful
(or fewer) pieces of actual code, and you are getting your thousands
of functions by wrapping them up with parameters and local variables,
well, now there are possibilities. Exactly what possibilities depend
on what your functions look like - which is one reason Robert Kern
asked you to clarify your code - but they all boil down to rearranging
the problem so that it goes back to "few functions, much data", then
writing the functions in such a way that you can use numpy to apply
them to thousands or millions of data points at once.

> Also originally I should have posted this to Scipy-User list. Would it be
> more appropriate to continue the discussion there?

Probably.


Anne

> Regards,
> eat
> """
>
> import numpy as np
>
> ## relevant part
> # expect indicies to be sparse ~1% but hundred of thousands of elements
> # also functions not necessary builtins and data may be large
> def how1(functions, data):
>    """ firsth approach to apply data to functions"""
>    def how(indicies):
>        return np.asarray([f(data) for f in functions[indicies]]).T
>    return how
>
>
> def how2(functions, data):
>    """ second approach to apply data to functions"""
>    def rr(data):
>        """ reverse the 'roles' of data and function"""
>        def f(g):
>            return g(data)
>        return f
>    daf= rr(data)
>    def how(indicies):
>        return np.asarray(map(daf, functions[indicies])).T
>    return how
>
> # as I understand it, how1 and how2 boils down under the hood pretty
> # much to the same code, so only syntatical differencies, right?
>
> # and this is the key question: does there exist a more suitable
> # scipy/ numpy solution for this situation?
>
> # perhaps some kind of special vectorization as below (as pseudo code)?
> def how3(functions, data):
>    """ third approach to apply data to functions"""
>    def vecme(functions, data, indicies):
>        return functions[indicies](data)
>    v= np.vectorize(vecme)
>    def how(indicies):
>        return np.asarray(v(functions, data, indicies)).T
>    return how
>
> ## end relevant part
> # rest is just some context where hows could be applied
> def stream(m, n):
>    """ mimic some external stream of 0\ 1 indicators"""
>    np.random.seed(123)
>    ind= np.asarray(np.random.randint(0, 2, (m, n)), dtype= bool)
>    for k in xrange(m):
>        yield ind[k, :]
>
> def process(stream, how):
>    """ consume stream"""
>    for ind in stream:
>        yield how(ind)
>
> def run(hows, n):
>    """run the hows"""
>    for app in hows.keys():
>        print 'approach:', app
>        for r in process(stream(3, n), hows[app]):
>            print np.round(r.squeeze(), 2)
>
> if __name__ == '__main__':
>    # some data and functions only as demonstration
>    data= np.random.random((3, 1))
>    fncs= np.asarray([np.sin, np.cos, np.tan, np.sinh, np.cosh, np.tanh])
>
>    # produce equivalent results
>    hows= {'firsth': how1(fncs, data),
>           'second': how2(fncs, data)}
> #           'third': how3(fncs, data)}
>    run(hows, len(fncs))
>
>
> _______________________________________________
> SciPy-Dev mailing list
> SciPy-Dev@scipy.org
> http://mail.scipy.org/mailman/listinfo/scipy-dev
>


More information about the SciPy-Dev mailing list