[SciPy-User] re[SciPy-user] moving for loops...
Benjamin Root
ben.root@ou....
Thu Jun 10 16:08:00 CDT 2010
Good! The -1 in the reshape means "however many it takes" to have a correct
reshaped array.
Of course, as always with developing code that uses reshape operations, do
some "sanity checks" to make sure that your array was reshaped in a manner
you expect. When you have many-dimension arrays, it is very easy to make a
mistake with reshape. Print out a few slices of the array and/or see if the
averages make sense until you are convinced that you coded it correctly.
Ben Root
On Thu, Jun 10, 2010 at 3:36 PM, mdekauwe <mdekauwe@gmail.com> wrote:
>
> OK I think it is clear now!! Although what does the -1 bit do, this is
> surely
> the same as saying 11, 12 or numyears, nummonths?
>
> thanks.
>
>
>
> Benjamin Root-2 wrote:
> >
> > Well, let's try a more direct example. I am going to create a 4d array
> of
> > random values to illustrate. I know the length of the dimensions won't
> be
> > exactly the same as yours, but the example will still be valid.
> >
> > In this example, I will be able to calculate *all* of the monthly
> averages
> > for *all* of the variables for *all* of the grid points without a single
> > loop.
> >
> >> jules = np.random.random((132, 10, 50, 3))
> >> print jules.shape
> > (132, 10, 50, 3)
> >
> >> jules_5d = np.reshape(jules, (-1, 12) + jules.shape[1:])
> >> print jules_5d.shape
> > (11, 12, 10, 50, 3)
> >
> >> jules_5d = np.ma.masked_array(jules_5d, mask=jules_5d < 0.0)
> >
> >> jules_means = np.mean(jules_5d, axis=0)
> >> print jules_means.shape
> > (12, 10, 50, 3)
> >
> > voila! This matrix has a mean for each month across all eleven years for
> > each datapoint in each of the 10 variables at each (I am assuming) level
> > in
> > the atmosphere.
> >
> > So, if you want to operate on a subset of your jules matrix (for example,
> > you need to do special masking for each variable), then you can just work
> > off of a slice of the original matrix, and many of these same concepts in
> > this example and the previous example still applies.
> >
> > Ben Root
> >
> >
> > On Thu, Jun 10, 2010 at 1:08 PM, mdekauwe <mdekauwe@gmail.com> wrote:
> >
> >>
> >> Hi,
> >>
> >> No if I am honest I am a little confused how what you are suggesting
> >> would
> >> work. As I see it the array I am trying to average from has dims
> >> jules[(numyears * nummonths),1,numpts,0]. Where the first dimension
> (132)
> >> is
> >> 12 months x 11 years. And as I said before I would like to average the
> >> jan
> >> from the first, second, third years etc. Then the same for the feb and
> so
> >> on.
> >>
> >> So I don't see how you get to your 2d array that you mention in the
> first
> >> line? I thought what you were suggesting was I could precompute the step
> >> that builds the index for the months e.g
> >>
> >> mth_index = np.zeros(0)
> >> for month in xrange(nummonths):
> >> mth_index = np.append(mth_index, np.arange(month, numyears *
> >> nummonths,
> >> nummonths))
> >>
> >> and use this as my index to skip the for loop. Though I still have a for
> >> loop I guess!
> >>
> >>
> >>
> >>
> >>
> >>
> >> Benjamin Root-2 wrote:
> >> >
> >> > Correction for me as well. To mask out the negative values, use
> masked
> >> > arrays. So we will turn jules_2d into a masked array (second line),
> >> then
> >> > all subsequent commands will still work as expected. It is very
> >> similar
> >> > to
> >> > replacing negative values with nans and using nanmin().
> >> >
> >> >> jules_2d = jules.reshape((-1, 12))
> >> >> jules_2d = np.ma.masked_array(jules_2d, mask=jules_2d < 0.0)
> >> >> jules_monthly = np.mean(jules_2d, axis=0)
> >> >> print jules_monthly.shape
> >> > (12,)
> >> >
> >> > Ben Root
> >> >
> >> > On Tue, Jun 8, 2010 at 7:49 PM, Benjamin Root <ben.root@ou.edu>
> wrote:
> >> >
> >> >> The np.mod in my example caused the data points to stay within [0,
> 11]
> >> in
> >> >> order to illustrate that these are months. In my example, months are
> >> >> column, years are rows. In your desired output, months are rows and
> >> >> years
> >> >> are columns. It makes very little difference which way you have it.
> >> >>
> >> >> Anyway, let's imagine that we have a time series of data "jules". We
> >> can
> >> >> easily reshape this like so:
> >> >>
> >> >> > jules_2d = jules.reshape((-1, 12))
> >> >> > jules_monthly = np.mean(jules_2d, axis=0)
> >> >> > print jules_monthly.shape
> >> >> (12,)
> >> >>
> >> >> voila! You have 12 values in jules_monthly which are means for that
> >> >> month
> >> >> across all years.
> >> >>
> >> >> protip - if you want yearly averages just change the ax parameter in
> >> >> np.mean():
> >> >> > jules_yearly = np.mean(jules_2d, axis=1)
> >> >>
> >> >> I hope that makes my previous explanation clearer.
> >> >>
> >> >> Ben Root
> >> >>
> >> >>
> >> >> On Tue, Jun 8, 2010 at 5:41 PM, mdekauwe <mdekauwe@gmail.com> wrote:
> >> >>
> >> >>>
> >> >>> OK...
> >> >>>
> >> >>> but if I do...
> >> >>>
> >> >>> In [28]: np.mod(np.arange(nummonths*numyears),
> >> nummonths).reshape((-1,
> >> >>> nummonths))
> >> >>> Out[28]:
> >> >>> array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11],
> >> >>> [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11],
> >> >>> [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11],
> >> >>> [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11],
> >> >>> [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11],
> >> >>> [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11],
> >> >>> [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11],
> >> >>> [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11],
> >> >>> [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11],
> >> >>> [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11],
> >> >>> [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]])
> >> >>>
> >> >>> When really I would be after something like this I think?
> >> >>>
> >> >>> array([ 0, 12, 24, 36, 48, 60, 72, 84, 96, 108, 120],
> >> >>> [ 1, 13, 25, 37, 49, 61, 73, 85, 97, 109, 121],
> >> >>> [ 2, 14, 26, 38, 50, 62, 74, 86, 98, 110, 122]
> >> >>> etc, etc
> >> >>>
> >> >>> i.e. so for each month jump across the years.
> >> >>>
> >> >>> Not quite sure of this example...this is what I currently have which
> >> >>> does
> >> >>> seem to work, though I guess not completely efficiently.
> >> >>>
> >> >>> for month in xrange(nummonths):
> >> >>> tmp = jules[xrange(0, numyears * nummonths,
> >> nummonths),VAR,:,0]
> >> >>> tmp[tmp < 0.0] = np.nan
> >> >>> data[month,:] = np.mean(tmp, axis=0)
> >> >>>
> >> >>>
> >> >>>
> >> >>>
> >> >>> Benjamin Root-2 wrote:
> >> >>> >
> >> >>> > If you want an average for each month from your timeseries, then
> >> the
> >> >>> > sneaky
> >> >>> > way would be to reshape your array so that the time dimension is
> >> split
> >> >>> > into
> >> >>> > two (month, year) dimensions.
> >> >>> >
> >> >>> > For a 1-D array, this would be:
> >> >>> >
> >> >>> >> dataarray = numpy.mod(numpy.arange(36), 12)
> >> >>> >> print dataarray
> >> >>> > array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 0, 1, 2,
> >> 3,
> >> >>> 4,
> >> >>> > 5, 6, 7, 8, 9, 10, 11, 0, 1, 2, 3, 4, 5, 6, 7,
> >> 8,
> >> >>> 9,
> >> >>> > 10, 11])
> >> >>> >> datamatrix = dataarray.reshape((-1, 12))
> >> >>> >> print datamatrix
> >> >>> > array([[ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11],
> >> >>> > [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11],
> >> >>> > [ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11]])
> >> >>> >
> >> >>> > Hope that helps.
> >> >>> >
> >> >>> > Ben Root
> >> >>> >
> >> >>> >
> >> >>> > On Fri, May 28, 2010 at 3:28 PM, mdekauwe <mdekauwe@gmail.com>
> >> wrote:
> >> >>> >
> >> >>> >>
> >> >>> >> OK so I just need to have a quick loop across the 12 months then,
> >> >>> that
> >> >>> is
> >> >>> >> fine, just thought there might have been a sneaky way!
> >> >>> >>
> >> >>> >> Really appreciated, getting there slowly!
> >> >>> >>
> >> >>> >>
> >> >>> >>
> >> >>> >> josef.pktd wrote:
> >> >>> >> >
> >> >>> >> > On Fri, May 28, 2010 at 4:14 PM, mdekauwe <mdekauwe@gmail.com>
> >> >>> wrote:
> >> >>> >> >>
> >> >>> >> >> ok - something like this then...but how would i get the index
> >> for
> >> >>> the
> >> >>> >> >> month
> >> >>> >> >> for the data array (where month is 0, 1, 2, 4 ... 11)?
> >> >>> >> >>
> >> >>> >> >> data[month,:] = array[xrange(0, numyears * nummonths,
> >> >>> >> nummonths),VAR,:,0]
> >> >>> >> >
> >> >>> >> > you would still need to start at the right month
> >> >>> >> > data[month,:] = array[xrange(month, numyears * nummonths,
> >> >>> >> > nummonths),VAR,:,0]
> >> >>> >> > or
> >> >>> >> > data[month,:] = array[month: numyears * nummonths :
> >> >>> nummonths),VAR,:,0]
> >> >>> >> >
> >> >>> >> > an alternative would be a reshape with an extra month dimension
> >> and
> >> >>> >> > then sum only once over the year axis. this might be faster but
> >> >>> >> > trickier to get the correct reshape .
> >> >>> >> >
> >> >>> >> > Josef
> >> >>> >> >
> >> >>> >> >>
> >> >>> >> >> and would that be quicker than making an array months...
> >> >>> >> >>
> >> >>> >> >> months = np.arange(numyears * nummonths)
> >> >>> >> >>
> >> >>> >> >> and you that instead like you suggested x[start:end:12,:]?
> >> >>> >> >>
> >> >>> >> >> Many thanks again...
> >> >>> >> >>
> >> >>> >> >>
> >> >>> >> >> josef.pktd wrote:
> >> >>> >> >>>
> >> >>> >> >>> On Fri, May 28, 2010 at 3:53 PM, mdekauwe <
> mdekauwe@gmail.com>
>