[Numpy-discussion] numpythonically getting elements with the minimum sum
Lluís
xscript@gmx....
Tue Jan 29 09:56:47 CST 2013
Sebastian Berg writes:
> On Tue, 2013-01-29 at 14:53 +0100, Lluís wrote:
>> Gregor Thalhammer writes:
>>
>> > Am 28.1.2013 um 23:15 schrieb Lluís:
>>
>> >> Hi,
>> >>
>> >> I have a somewhat convoluted N-dimensional array that contains information of a
>> >> set of experiments.
>> >>
>> >> The last dimension has as many entries as iterations in the experiment (an
>> >> iterative application), and the penultimate dimension has as many entries as
>> >> times I have run that experiment; the rest of dimensions describe the features
>> >> of the experiment:
>> >>
>> >> data.shape == (... indefinite amount of dimensions ..., NUM_RUNS, NUM_ITERATIONS)
>> >>
>> >> So, what I want is to get the data for the best run of each experiment:
>> >>
>> >> best.shape == (... indefinite amount of dimensions ..., NUM_ITERATIONS)
>> >>
>> >> by selecting, for each experiment, the run with the lowest total time (sum of
>> >> the time of all iterations for that experiment).
>> >>
>> >>
>> >> So far I've got the trivial part, but not the final indexing into "data":
>> >>
>> >> dsum = data.sum(axis = -1)
>> >> dmin = dsum.min(axis = -1)
>> >> best = data[???]
>> >>
>> >>
>> >> I'm sure there must be some numpythonic and generic way to get what I want, but
>> >> fancy indexing is beating me here :)
>>
>> > Did you have a look at the argmin function? It delivers the indices of the minimum values along an axis. Untested guess:
>>
>> > dmin_idx = argmin(dsum, axis = -1)
>> > best = data[..., dmin_idx, :]
>>
>> Ah, sorry, my example is incorrect. I was actually using 'argmin', but indexing
>> with it does not exactly work as I expected:
>>
>> >>> d1.shape
>> (2, 5, 10)
>> >>> dsum = d1.sum(axis = -1)
>> >>> dmin = d1.argmin(axis = -1)
>> >>> dmin.shape
>> (2,)
>> >>> d1_best = d1[...,dmin,:]
> You need to use fancy indexing. Something like:
>>>> d1_best = d1[np.arange(2), dmin,:]
> Because the Ellipsis takes everything from the axis, while you want to
> pick from multiple axes at the same time. That can be achieved with
> fancy indexing (indexing with arrays). From another perspective, you
> want to get rid of two axes in favor of a new one, but a slice/Ellipsis
> always preserves the axis it works on.
Nice, thanks. That works for this specific example, but I couldn't get it to
work with "d1.shape == (1, 2, 16, 5, 10)" (thus "dmin.shape == (1, 2, 16)"):
>>> def get_best_run (data, field):
... """Returns the best run."""
... data = data.view(np.ndarray)
... assert data.ndim >= 2
... dsum = data[field].sum(axis=-1)
... dmin = dsum.argmin(axis=-1)
... idxs = [ np.arange(dlen) for dlen in data.shape[:-2] ]
... idxs += [ dmin ]
... idxs += [ slice(None) ]
... return data[tuple(idxs)]
>>> d1.shape
(2, 5, 10)
>>> get_best_run(d1, "time")
(2, 10)
>>> d2.shape
(1, 2, 16, 5, 10)
>>> get_best_run(d2, "time")
Traceback (most recent call last):
...
File "./plot-user.py", line 89, in get_best_run
res = data.view(np.ndarray)[tuple(idxs)]
ValueError: shape mismatch: objects cannot be broadcast to a single shape
After reading the "Advanced indexing section", my understanding is that the
elements in "idxs" are not broadcastable to the same shape, but I'm not sure how
I should build them to be broadcastable to what specific shape.
Thanks a lot,
Lluis
>> >>> d1_best.shape
>> (2, 2, 10)
>>
>>
>> Assuming 1st dimension is the test, 2nd the run and 10th the iterations, using
>> this previous code with some example values:
>>
>> >>> dmin
>> [4 3]
>> >>> d1_best
>> [[[ ... contents of d1[0,4,:] ...]
>> [ ... contents of d1[0,3,:] ...]]
>> [[ ... contents of d1[1,4,:] ...]
>> [ ... contents of d1[1,3,:] ...]]]
>>
>>
>> While I actually want this:
>>
>> [[ ... contents of d1[0,4,:] ...]
>> [ ... contents of d1[1,3,:] ...]]
--
"And it's much the same thing with knowledge, for whenever you learn
something new, the whole world becomes that much richer."
-- The Princess of Pure Reason, as told by Norton Juster in The Phantom
Tollbooth
More information about the NumPy-Discussion
mailing list