[SciPy-user] Performance problem/suggestion for scikits.timeseries.convert

Abiel X Reinhart abiel.x.reinhart@jpmchase....
Tue Apr 7 11:17:29 CDT 2009

```I have recently begun working with the useful scikits.timeseries package, and noticed some performance issues in the ts.convert() function. For example, when converting 1000 monthly values to a quarterly frequency using the ma.mean() function, it took me about 0.6 seconds. This isn't that bad, but it definitely can become an issue when working with many series or longer timespans.

After looking at the scikits.timeseries source code, I found essentially all the delay was coming from the ma.apply_along_axis() call inside _convert1d() function. I am not that familiar with the numpy functions, but it seems that ma.apply_along_axis can be be rather slow. For instance, consider the following code:

a = np.arange(300000).reshape(30000,10)
b = ma.mean(a,-1)
c = ma.apply_along_axis(ma.mean, -1, a)

In this example, b = c, but b is generated much quicker. My system was always able to generate b in less than 0.02 seconds. but took about 4.3 seconds to generate c.

Perhaps an improvement could be made to the convert() function by recognizing a standard set of built-in numpy functions like ma.mean and applying the method used to generate "b" above, and only using ma.apply_along_axis() for custom functions. Since I imagine most people use standard aggregation functions like ma.mean and ma.sum, this could lead to a big speed improvement. I am building a GUI application, and this would make the difference between an application that reacts essentially instantly and one that hangs slightly in many situations.

One other possible solution seems to be leave scikits.timeseries unchanged, and do something like the following:

Let t be a monthly time series.

t = t.convert(freq="Q")
t = ts.time_series(ma.mean(t,-1), freq="Q", start_date=t.start_date)

The downside of this is its just more verbose, and many users may not even think of it.

Thanks very much.

Abiel

