[SciPy-User] re[SciPy-user] moving for loops...

Benjamin Root ben.root@ou....
Thu Jun 10 16:08:00 CDT 2010


Good!  The -1 in the reshape means "however many it takes" to have a correct
reshaped array.

Of course, as always with developing code that uses reshape operations, do
some "sanity checks" to make sure that your array was reshaped in a manner
you expect.  When you have many-dimension arrays, it is very easy to make a
mistake with reshape.  Print out a few slices of the array and/or see if the
averages make sense until you are convinced that you coded it correctly.

Ben Root

On Thu, Jun 10, 2010 at 3:36 PM, mdekauwe <mdekauwe@gmail.com> wrote:

>
> OK I think it is clear now!! Although what does the -1 bit do, this is
> surely
> the same as saying 11, 12 or numyears, nummonths?
>
> thanks.
>
>
>
> Benjamin Root-2 wrote:
> >
> > Well, let's try a more direct example.  I am going to create a 4d array
> of
> > random values to illustrate.  I know the length of the dimensions won't
> be
> > exactly the same as yours, but the example will still be valid.
> >
> > In this example, I will be able to calculate *all* of the monthly
> averages
> > for *all* of the variables for *all* of the grid points without a single
> > loop.
> >
> >> jules = np.random.random((132, 10, 50, 3))
> >> print jules.shape
> > (132, 10, 50, 3)
> >
> >> jules_5d = np.reshape(jules, (-1, 12) + jules.shape[1:])
> >> print jules_5d.shape
> > (11, 12, 10, 50, 3)
> >
> >> jules_5d = np.ma.masked_array(jules_5d, mask=jules_5d < 0.0)
> >
> >> jules_means = np.mean(jules_5d, axis=0)
> >> print jules_means.shape
> > (12, 10, 50, 3)
> >
> > voila! This matrix has a mean for each month across all eleven years for
> > each datapoint in each of the 10 variables at each (I am assuming) level
> > in
> > the atmosphere.
> >
> > So, if you want to operate on a subset of your jules matrix (for example,
> > you need to do special masking for each variable), then you can just work
> > off of a slice of the original matrix, and many of these same concepts in
> > this example and the previous example still applies.
> >
> > Ben Root
> >
> >
> > On Thu, Jun 10, 2010 at 1:08 PM, mdekauwe <mdekauwe@gmail.com> wrote:
> >
> >>
> >> Hi,
> >>
> >> No if I am honest I am a little confused how what you are suggesting
> >> would
> >> work. As I see it the array I am trying to average from has dims
> >> jules[(numyears * nummonths),1,numpts,0]. Where the first dimension
> (132)
> >> is
> >> 12 months x 11 years. And as I said before I would like to average the
> >> jan
> >> from the first, second, third years etc. Then the same for the feb and
> so
> >> on.
> >>
> >> So I don't see how you get to your 2d array that you mention in the
> first
> >> line? I thought what you were suggesting was I could precompute the step
> >> that builds the index for the months e.g
> >>
> >> mth_index = np.zeros(0)
> >> for month in xrange(nummonths):
> >>     mth_index = np.append(mth_index, np.arange(month, numyears *
> >> nummonths,
> >> nummonths))
> >>
> >> and use this as my index to skip the for loop. Though I still have a for
> >> loop I guess!
> >>
> >>
> >>
> >>
> >>
> >>
> >> Benjamin Root-2 wrote:
> >> >
> >> > Correction for me as well.  To mask out the negative values, use
> masked
> >> > arrays.  So we will turn jules_2d into a masked array (second line),
> >> then
> >> > all subsequent commands will still work as expected.  It is very
> >> similar
> >> > to
> >> > replacing negative values with nans and using nanmin().
> >> >
> >> >> jules_2d = jules.reshape((-1, 12))
> >> >> jules_2d = np.ma.masked_array(jules_2d, mask=jules_2d < 0.0)
> >> >> jules_monthly = np.mean(jules_2d, axis=0)
> >> >> print jules_monthly.shape
> >> >   (12,)
> >> >
> >> > Ben Root
> >> >
> >> > On Tue, Jun 8, 2010 at 7:49 PM, Benjamin Root <ben.root@ou.edu>
> wrote:
> >> >
> >> >> The np.mod in my example caused the data points to stay within [0,
> 11]
> >> in
> >> >> order to illustrate that these are months.  In my example, months are
> >> >> column, years are rows.  In your desired output, months are rows and
> >> >> years
> >> >> are columns.  It makes very little difference which way you have it.
> >> >>
> >> >> Anyway, let's imagine that we have a time series of data "jules".  We
> >> can
> >> >> easily reshape this like so:
> >> >>
> >> >> > jules_2d = jules.reshape((-1, 12))
> >> >> > jules_monthly = np.mean(jules_2d, axis=0)
> >> >> > print jules_monthly.shape
> >> >>   (12,)
> >> >>
> >> >> voila!  You have 12 values in jules_monthly which are means for that
> >> >> month
> >> >> across all years.
> >> >>
> >> >> protip - if you want yearly averages just change the ax parameter in
> >> >> np.mean():
> >> >> > jules_yearly = np.mean(jules_2d, axis=1)
> >> >>
> >> >> I hope that makes my previous explanation clearer.
> >> >>
> >> >> Ben Root
> >> >>
> >> >>
> >> >> On Tue, Jun 8, 2010 at 5:41 PM, mdekauwe <mdekauwe@gmail.com> wrote:
> >> >>
> >> >>>
> >> >>> OK...
> >> >>>
> >> >>> but if I do...
> >> >>>
> >> >>> In [28]: np.mod(np.arange(nummonths*numyears),
> >> nummonths).reshape((-1,
> >> >>> nummonths))
> >> >>> Out[28]:
> >> >>> array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11],
> >> >>>       [ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11],
> >> >>>        [ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11],
> >> >>>        [ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11],
> >> >>>       [ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11],
> >> >>>        [ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11],
> >> >>>        [ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11],
> >> >>>       [ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11],
> >> >>>        [ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11],
> >> >>>        [ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11],
> >> >>>        [ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11]])
> >> >>>
> >> >>> When really I would be after something like this I think?
> >> >>>
> >> >>> array([  0,  12,  24,  36,  48,  60,  72,  84,  96, 108, 120],
> >> >>>        [  1,  13,  25,  37,  49,  61,  73,  85,  97, 109, 121],
> >> >>>        [  2,  14,  26,  38,  50,  62,  74,  86,  98, 110, 122]
> >> >>>        etc, etc
> >> >>>
> >> >>> i.e. so for each month jump across the years.
> >> >>>
> >> >>> Not quite sure of this example...this is what I currently have which
> >> >>> does
> >> >>> seem to work, though I guess not completely efficiently.
> >> >>>
> >> >>> for month in xrange(nummonths):
> >> >>>         tmp = jules[xrange(0, numyears * nummonths,
> >> nummonths),VAR,:,0]
> >> >>>        tmp[tmp < 0.0] = np.nan
> >> >>>        data[month,:] = np.mean(tmp, axis=0)
> >> >>>
> >> >>>
> >> >>>
> >> >>>
> >> >>> Benjamin Root-2 wrote:
> >> >>> >
> >> >>> > If you want an average for each month from your timeseries, then
> >> the
> >> >>> > sneaky
> >> >>> > way would be to reshape your array so that the time dimension is
> >> split
> >> >>> > into
> >> >>> > two (month, year) dimensions.
> >> >>> >
> >> >>> > For a 1-D array, this would be:
> >> >>> >
> >> >>> >> dataarray = numpy.mod(numpy.arange(36), 12)
> >> >>> >> print dataarray
> >> >>> > array([ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11,  0,  1,  2,
> >>  3,
> >> >>>  4,
> >> >>> >         5,  6,  7,  8,  9, 10, 11,  0,  1,  2,  3,  4,  5,  6,  7,
> >>  8,
> >> >>>  9,
> >> >>> >        10, 11])
> >> >>> >> datamatrix = dataarray.reshape((-1, 12))
> >> >>> >> print datamatrix
> >> >>> > array([[ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11],
> >> >>> >        [ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11],
> >> >>> >        [ 0,  1,  2,  3,  4,  5,  6,  7,  8,  9, 10, 11]])
> >> >>> >
> >> >>> > Hope that helps.
> >> >>> >
> >> >>> > Ben Root
> >> >>> >
> >> >>> >
> >> >>> > On Fri, May 28, 2010 at 3:28 PM, mdekauwe <mdekauwe@gmail.com>
> >> wrote:
> >> >>> >
> >> >>> >>
> >> >>> >> OK so I just need to have a quick loop across the 12 months then,
> >> >>> that
> >> >>> is
> >> >>> >> fine, just thought there might have been a sneaky way!
> >> >>> >>
> >> >>> >> Really appreciated, getting there slowly!
> >> >>> >>
> >> >>> >>
> >> >>> >>
> >> >>> >> josef.pktd wrote:
> >> >>> >> >
> >> >>> >> > On Fri, May 28, 2010 at 4:14 PM, mdekauwe <mdekauwe@gmail.com>
> >> >>> wrote:
> >> >>> >> >>
> >> >>> >> >> ok - something like this then...but how would i get the index
> >> for
> >> >>> the
> >> >>> >> >> month
> >> >>> >> >> for the data array (where month is 0, 1, 2, 4 ... 11)?
> >> >>> >> >>
> >> >>> >> >> data[month,:] = array[xrange(0, numyears * nummonths,
> >> >>> >> nummonths),VAR,:,0]
> >> >>> >> >
> >> >>> >> > you would still need to start at the right month
> >> >>> >> > data[month,:] = array[xrange(month, numyears * nummonths,
> >> >>> >> > nummonths),VAR,:,0]
> >> >>> >> > or
> >> >>> >> > data[month,:] = array[month: numyears * nummonths :
> >> >>> nummonths),VAR,:,0]
> >> >>> >> >
> >> >>> >> > an alternative would be a reshape with an extra month dimension
> >> and
> >> >>> >> > then sum only once over the year axis. this might be faster but
> >> >>> >> > trickier to get the correct reshape .
> >> >>> >> >
> >> >>> >> > Josef
> >> >>> >> >
> >> >>> >> >>
> >> >>> >> >> and would that be quicker than making an array months...
> >> >>> >> >>
> >> >>> >> >> months = np.arange(numyears * nummonths)
> >> >>> >> >>
> >> >>> >> >> and you that instead like you suggested x[start:end:12,:]?
> >> >>> >> >>
> >> >>> >> >> Many thanks again...
> >> >>> >> >>
> >> >>> >> >>
> >> >>> >> >> josef.pktd wrote:
> >> >>> >> >>>
> >> >>> >> >>> On Fri, May 28, 2010 at 3:53 PM, mdekauwe <
> mdekauwe@gmail.com>
>