[SciPy-user] calculations using the datetime information of timeseries
Pierre GM
pgmdevlist@gmail....
Wed Nov 12 19:35:57 CST 2008
Timmie,
Let's go through method #1 first:
> snew = series_dummy
>
> ###method 1
>
> for i in range(0,snew.size):
> snew[i] = snew[i]* 2 #snew.dates[i].datetime
Your `snew` object is only a reference to `series_dummy`. When you
modify an element of snew, you're in fact modifying the corresponding
element of `series_dummy`. That's a feature of Python, you would get
the same result with lists:
>>> a = [0,0,0]
>>> b = a
>>> b[0] = 1
>>> a
[1,0,0]
If you want to avoid that, you can make snew a copy of series_dummy
snew = series_dummy.copy()
Now, method #2:
>
> for i in range(0,snew.size):
> snew = snew*2
Are you sure that's what you want to do ? you could do
snew = snew*(2**snew.size)
and get the same result.
Anyway: here, you change what snew is at each iteration: initially, it
was a reference to series_dummy, now, it's a reference to another
(temporary) object, snew*2. No back propagation of results.
Finally, some comments for method #3:
You want to create a new timeseries based on the result of some
calculation on the data part, but still using the dates of the initial
series ?
If you don't have any missing values, perform the computation on
series._data, that'll be faster. If you have mssing values, use the
series._series instead to access directly the MaskedArray methods, and
not the timeseries ones (you don't want to carry the dates around if
you don't need them).
As a wrap-up:
Try to avoid looping if you can. You said a generic form of your
function is:
>
> def myfunction(datetime_obj, scaling_factor):
> pass
Do you really need datetime objects ? In your example, you were using
series.dates[i].datetime.hour, a list. You should have used
series.dates.hour, which is an array. Using functions on an array as a
whole is far more efficient than using the same functions on each
element of the array.
Let me know how it goes, and don't hesitate to contact me off-list if
you need some help with your function.
Cheers
P.
>
> I found out that I can get the datetime for each entry with
>
> for i in range(0, series.size):
> series[i] = myfunction(series.dates.tolist()[i], 10.)
>
> Now, I noticed a strange thing.
>
> If I have a base series "base_series" and assige it to a new one with
>
> new_series = base_series
>
> The base_series gets updated/changed according to all calculations I
> perform on new_series (Please see method 1 below).
>
> The only way I could imagine to make my code work is createding lots
> of
> template series like in method 3 below. This way lets me calculate my
> new values in new_series using the datetime information and still
> retrain base_series with its original values.
>
> I kindly ask you to shed some light why the base_series get changed
> when
> I change derived series.
>
> Is there a more efficient way to acomplish my task that I may haven't
> thought of so far?
>
> Thanks in advance!
> Kind regards,
> Timmie
>
>
>
> #### BELOW A SAMPLE SCRIPT THAT MAY ILLUSTRATE ####
>
> #!/usr/bin/env python
> # -*- coding: utf-8 -*-
>
> import datetime
> import scikits.timeseries as ts
>
> import numpy as np
>
> #create dummy series
> data = np.zeros(600)+1
> now = datetime.datetime.now()
> start = datetime.datetime(now.year, now.month, now.day)
> #print start
> start_date = ts.Date('H', datetime=start)
> #print start_date
> series_dummy = ts.time_series(data, dtype=np.float_, freq='H',
> start_date=start_date)
>
> snew = series_dummy
>
> ###method 1
>
> for i in range(0,snew.size):
> snew[i] = snew[i]* 2 #snew.dates[i].datetime
>
> print "method 1:", snew.sum()-series_dummy.sum()
>
> ###method 2
>
> for i in range(0,snew.size):
> snew = snew*2
>
> print "method 2:", snew.sum()-series_dummy.sum()
>
> #method 3:
>
> data = np.zeros(series_dummy.size)+1
> dt_arr = series_dummy.dates
> cser = ts.time_series(data.astype(np.float_), dt_arr)
> for i in range(0,cser.size):
> # note: cser.dates[i].datetime.hour is just used as an example
> # my function performes calculations based on the value of the
> datetime of each data point for each data point (current datetime is
> the
> input parameter).
>
> cser[i] = cser.dates[i].datetime.hour
>
> print "method 3:", cser.sum()-series_dummy.sum()
>
> _______________________________________________
> SciPy-user mailing list
> SciPy-user@scipy.org
> http://projects.scipy.org/mailman/listinfo/scipy-user
More information about the SciPy-user
mailing list