[Numpy-discussion] convert strides/shape/offset into nd index?
Dag Sverre Seljebotn
Tue Dec 1 04:17:42 CST 2009
Anne Archibald wrote:
> 2009/11/30 James Bergstra <firstname.lastname@example.org>:
>> Your question involves a few concepts:
>> - an integer vector describing the position of an element
>> - the logical shape (another int vector)
>> - the physical strides (another int vector)
>> Ignoring the case of negative offsets, a physical offset is the inner
>> product of the physical strides with the position vector.
>> In these terms, you are asking how to solve the inner-product equation
>> for the position vector. There can be many possible solutions (like,
>> if there is a stride of 1, then you can make that dimension account
>> for the entire offset. This is often not the solution you want.).
>> For valid ndarrays though, there is at most one solution though with
>> the property that every position element is less than the shape.
>> You will also need to take into account that for certain stride
>> vectors, there is no way to get certain offsets. Imagine all the
>> strides were even, and you needed to get at an odd offset... it would
>> be impossible. It would even be impossible if there were a dimension
>> with stride 1 but it had shape of 1 too.
>> I can't think of an algorithm off the top of my head that would do
>> this in a quick and elegant way.
> Not to be a downer, but this problem is technically NP-complete. The
> so-called "knapsack problem" is to find a subset of a collection of
> numbers that adds up to the specified number, and it is NP-complete.
> Unfortunately, it is exactly what you need to do to find the indices
> to a particular memory location in an array of shape (2,2,...,2).
> What that means in practice is that either you have to allow
> potentially very slow algorithms (though you know that there will
> never be more than 32 different values in the knapsack, which might or
> might not be enough to keep things tractable) or use heuristic
> algorithms that don't always work. There are probably fairly good
> heuristics, particularly if the array elements are all at distinct
> memory locations (arrays with overlapping elements can arise from
> broadcasting and other slightly more arcane operations).
Not that this should be done, but getting a chance to discuss NP is
I think this particular problem can be solved in O(d*n^2) or better,
where n is the offset in question and d the number of dimensions of the
array, by using dynamic programming on the buffer offset in question (so
first try for offset 1, then 2, and so on up to n).
Which doesn't contradict the fact that the problem is exponential (n is
exponential in terms of the length of the input to the problem), but it
is still not *too* bad in many cases, because the exponential term is
always smaller than the size of the array.
More information about the NumPy-Discussion